Eevee and viewport optimization for drawing many objects and instances, by batching draw calls. (Clément Foucault)

I’ve just noticed this on the Meeting Notes: Eevee and viewport optimization for drawing many objects and instances, by batching draw calls. ( Clément Foucault )
I’d really like to read a blog post or something what were the issues, the challenges and how was it implemented, etc.

3 Likes

I noticed that the eevee engine, in some parts is not optimized, it often recalculates some already “backed-stored” processes without it being really necessary, or I noticed that in rendering shaders, in scenes where there are many instances of materials and objects, it calculates every shader for each object, and I believe that here you could save a lot of time and memory, by having to calculate all the shading levels that would remain the same and do not share anything with the nearest environment.

An example scene where many rendering calculations could be saved is this:

https://download.blender.org/demo/test/classroom.zip

Currently despite having many instances, shading calculation are very slow, cycles in some ways is faster.
The new batching draw calls system is certainly advantageous, but could certainly be extended to other parts as listed.


In a slightly more general dimension they could save calculations in many parts, for example I created this video that shows that the soft shadows that are recalculated even with a simple click of selection, or by rotating the whole scene or changing the color of an object…

I believe that there is a lot of room for optimization here by creating a system that once the scene has been renderized, only the environment that is modified will be recalculated, if for example I change the color of an object, I don’t need to recalculate all the framebuffer of the whole scene, but only of that object and what influence in its closest environment.

The advantage of creating such a system is that in very complex and full of objects scenes, this would save a lot of calculations especially in the editing phase, but also in the final rendering, where if only some objects are modified in an animated scene , it is not necessary to recalculate everything else frame by frame.
@fclem
Greetings Masters

This was mostly a TODO left from the opengl 3.3 transition. We were using display lists to render particle system in the past but we needed a system that can handle the same amount of objects instances.

This is what the draw call batching is about. We are sorting all drawcalls per state and use Multi Indirect Drawcalls to render a lot of similar geometry with reduced amount of driver overhead.

Unfortunately this optimization is limited to newer hardware that supports some necessary OpenGL extensions.

I’m not sure I understand what you are trying to say.

I noticed that in rendering shaders, in scenes where there are many instances of materials and objects, it calculates every shader for each object

Eevee does drawcall batching if the objects have states that goes together. If 2 objects don’t use the same shader, they can’t be batched together.

Currently despite having many instances, shading calculation are very slow, cycles in some ways is faster.

Shading in EEVEE is slow mainly because of lighting evaluation and mesh structure:

  • Lighting is evaluated using LTC area lights that are state of the art technique that is not even used in games.
  • The mesh data are compressed (for normals at least) but UVs and Tangents are not and they use quite a bit of bandwidth.

Also using the new bump map (2.81) now has a big performance impact as it basically multiply by 3 the number of nodes in the bump/displacement node branch.

Using noise texture instead of bitmaps is also really heavy.

I believe that there is a lot of room for optimization here by creating a system that once the scene has been renderized, only the environment that is modified will be recalculated, if for example I change the color of an object, I don’t need to recalculate all the framebuffer of the whole scene, but only of that object and what influence in its closest environment.

This is not possible for the same reason cycles does not do it. But the good news is that such a system already exists, it is called baking, and while it is not possible to bake everything in eevee, you can at least bake the indirect lighting.

The shadows could be baked into lightmaps but this introduce a lot of complexity.

Also selecting objects should not trigger viewport refresh if nothing changes. This is a low priority bug that needs to be addressed.

1 Like

Thanks for the reply,
What I try to highlight is the time to compile shaders in particularly complex scenes if they can be optimized.
What I thought, especially where there are many instances like the scene I mentioned. If you have done some testing, you will have noticed shader preparation-compilation-caching times are relatively high (I am not referring to the realtime rendering of eevee, where in fact, once the shaders are chached-compiled everything becomes light. I would have imagined that there being many instances there were margins of optimization.

@fclem
Another optimization monster that I think could be used in some way, is to integrate intel denoiser with eevee in some parts … Currently in Cycles, scenes that previously required 100 Samples are now superbly rendered with 10-20 samples obviously reducing rendering times…
By now the intel denoiser is integrated in blender and I think you should find creative ways to use it to your advantage :wink:
But you probably have already noticed it and you have already thought about it …

Shader compile time is not our resort. We can try simplifying the shader code but the hard freeze is cause by linking in the driver. Until the time where we can precompile spirv shaders in an external thread, we cannot speed it up by an order of magnitude.

About denoising, the intel denoiser is only usable if there is structural noise. But Eevee don’t use noise most of the time because rasterization needs to happen on a fixed grid of pixels (for shadowmaps, for volumetrics, for main buffer etc…). For certain passes it could be usable (SSR) but may need more output (normals, diffuse pass) and so increase the render-time altogether. That’s my current opinion but I didn’t look into it enough be 100% sure that it can’t be beneficial.

2 Likes

Hey, time ago isaw that one of the targets will be change the uv editor with shading. It is actually a Target? Do you have a public roadmap?

The roadmaps for all the modules can be found in the upper right corner of the page on https://developer.blender.org under the Release Status section.

1 Like

Does EEVEE do any optimization for re-used node groups? For my NPR work, I have about ten nodes I reuse. The total number of nodes can be 100s without groups. I haven’t tried comparing performance for this, though.

just for information, someone is using the intel denoiser with cycles for baking and real time

1 Like