EEVEE shader compilation process information needed

I wonder why opengl only uses cached version in existing blend file and not new .blend file.

There is clearly something else going on there. Are you closing blender every time you create a new blend file?

I’m opening a new instance of blender, ctrl c from the existing file in which the material has already compiled, than ctrl v the object into the new instance. It results in recompile of the shader, suggesting it’s nothing to do with opengl, and more to do with Blender?

Wait, I think I misread what you are doing. So it seems you have two instances of blender open, and are trying to copy and paste between sessions, correct? Those sessions are probably not going to be sharing the same running cache and one of them might not even have permission to open the saved cache (if there is any) at all. Try to limit it to a single instance.

I think Brecht was saying that because it’s openGL that’s responsible for the re-use of cache of group, that the cache is available for days at a time, completely unrelated to the session of Blender.

If I close all sessions of Blender, then start a new session and open a file which has materials in it that have already compiled, then it doesn’t recompile. So it looks like it’s not related to the session, and because it’s still recompiling in a new file, that suggests it’s not related to OpenGL, looks like the cache is existing within the .blend file…

The cache is not in the blend file.

If I close all sessions of Blender, then start a new session and open a file which has materials in it that have already compiled, then it doesn’t recompile.

That suggests that it is trying to save/load the shader cache to and from a file on disk. Those multiple sessions appear to be going without any previous cache due to file permissions with multiple instances trying to access the same file.

not the opengl cache, but some other cache which blender is able to use to speed up the shader compilation.

There are two parts to the compilation. The shader graph to GLSL code that Blender does, and GLSL code to hardware instructions that the OpenGL driver does. Both have a performance impact.

The existence and implementation of caching at the OpenGL driver depends on the driver. I’m saying it may cache to disk, but I don’t know if it does for your particular GPU / driver / shader code.

1 Like

when blender shows compiling shaders, is that the part which is converting shader graph to GLSL? Or does that encompass both shader graph to glsl and glsl to hardware instructions?

I’m just thinking if that 26 seconds of compiling shaders is predominantly used for converting the shader graph to glsl, then perhaps it might be possible to save and reuse a groups glsl rather than recompiling it if it hasn’t change (during the first stage of compilation at Blender’s side)

1 Like

It’s only the GLSL code to hardware instructions as far as I know, that tends to be the slowest part.

1 Like

Another question.
I understand that both GPU and CPU are involved in the process of shader compilation. When I monitor use of them, it is clear that a single thread of the CPU is involved at 100%, but GPU utilization varies only from 5% to 50%. If it weren’t because with the same CPU I noticed that shader compilation takes much longer with intel iGPU than with nvidia GTX 960, I would have said that GPU use is not intensive for this, but GPU has influence apparently.

Anyway, can this be even better optimized? Could this task be a multi-threaded CPU process one day and will it improve shader compilation times?

1 Like

The CPU runs the compiler, the GPU just gets fed the finished machine instructions. There are likely other factors why certain GPUs and drivers might be slower during this process that does not necessarily equate to overall performance. OpenGL is a heavily single threaded API from before multi-threading was really a thing and has aged quite a bit, hence why Vulkan exists. So yes, when blender moves to Vulkan, multi-threaded shader compilation and pipeline creation is very much a recommended and doable optimization for compiling more pipelines and shaders at a single time. Not sure if there are any viable fixes over GL in the meantime.

1 Like

@brecht I’ve posted on the Khronos forum, and one of the senior members is asking the below. Are you able to answer any of these questions?

I’ve seen very large GLSL shader programs (all stages) with a lot of unused bloat in them that take ~300-500 ms to compile+link. But 30 seconds is outrageous,

You should dig into this and see what Blender’s doing for all that time.

  1. How long is its shader graph -to- GLSL conversion taking?
  2. How long is it taking to compile the GLSL source code for the shader stages in a single program?
  3. How long is it taking to link the compiled shader objects into that single shader program?
  4. Is it generating+compiling+linking a bunch of shader programs rather than just one? How many?
  5. Is it sending a lot of unused bloat in the GLSL source code (**)?
  6. What CPU and GL driver is this on?

If I were you, I’d want to know.

** Re bloat. I’m talking about lots of comments and uncalled/unexecuted code in the GLSL which adds needless content that the GLSL compiler has to wade through, parse, and ultimately just chuck into the bitbucket.

If you can, get a hold of the GLSL source for the shader stages in that program (prob just vertex and fragment) and post it if you can. I’d like to see this.

More correctly, adding a single node results in Blender demanding that OpenGL recompile the entire GLSL shader to machine code again, from scratch.

Is Blender using separate shader objects? Or shader subroutines? Or assembly shaders (including SPIR-V)?

Or is it re-providing new GLSL source code for the entire shader program to OpenGL to compile and link at runtime, where the graphics driver has no choice but to do exactly what the application demands.

Also re the comments in that Blender thread you linked to about a “shader cache”… Blender could read back compiled+linked shader binaries and save them off, forming its own cache, stored someplace such as in the .blend file or associated file (e.g. see ARB_get_program_binary). But if I infer correctly from that thread, they’re implicitly saying Blender does not do this but instead hopes/expects that the OpenGL driver implements some kind of compiled GLSL shader program cache like this. Some do.

It’s premature at this point to jump to this as a potential solution (because we don’t really know what the problem is yet). But just FYI…

The GPU vendors stopped supporting ARB assembly shaders years ago. NVIDIA held out until just a few years back with their vendor-specific (NV) assembly shaders, but recent GPU features are no longer supportable through that assembly language; they’ve switched to SPIR-V. So you won’t be patching those assembly shaders anymore if you want cross-GPU support or support for the latest features.

SPIR-V assembly is another possible option GL supports for loading shaders (bypassing the need to compile GLSL in the GL driver). But it doesn’t support some important GL features like bindless texture, so that makes it a non-option for some GL users.

Without re-generating shaders at the assembly level, you make the most of the support you’ve got through the GLSL high-level shading language. Let’s see how effective Blender’s use of it is

2 Likes

Remember to make sure you have your graphics drivers up to date before chasing any rabbits here. If that is the issue it can be frustrating to people trying to debug your issue.

26 seconds is EXTREMELY BAD for shader compilation times in most development environments, but I know I have waited longer on UE3/4 to compile a single shader than 500ms (could be mostly overhead though). Blender materials in particular can get absurdity taxing on fragment shader instruction count when the artist makes ample use of procedural noise/bump mapping, Blender can kind of go off the deep-end past any reasonable shader for games/visualization there. It’s not out of the question that the shader is just that ridiculous enough to cause those compile times on your machine.

Maybe you could share a .blend/screenshot of the node graph which is causing this slowdown?

@3di, I don’t think your post on the Khronos forum is an accurate representation of what I wrote, and not likely to lead to anything useful.

I tried to answer your questions quickly off the top of my head here. But please don’t assume developers here have the time to engage with your feature requests / requests for improvements more deeply. If there is a project to optimize compilation times we’ll consider the design as a whole, talk to the relevant people, and have more specific requests for Khronos or driver implementations. Trying to set this up as a user who does not know the specifics of the implementation is mostly going to lead to a lot of confusion and take up people’s time.

2 Likes

Trying to set organize other developers to work on improving the EEVEE shader compilation wasn’t the point of the post. I asked if it was worth delving into the code, and you said it wasn’t because opengl/vulcan graphics drivers have no good mechanism for such partial updates, and that the blue progress bar that’s taking all the time is the glsl to machine code on opengl’s side, rather than the shader graph to glsl.

So my intention was to find out if it was worth spending my time learning the GLSL on Blender’s side to try and optimise the shader compilation. So hopefully you can see I’m not an end user trying to organise everyone to start working on my feature request to improve blender…I’ve submitted and had code approved previously, and that was my intention here if it was worth it.

Chances are there is not much you can do about it. I agree with Brecht, getting on the Khronos developer support forums without an understanding of the problem area or graphics programming first isn’t going to lead to any helpful discussions.

It would be a better use of your time to test on multiple machines to see if 26 seconds is an outlier that needs to be fixed via driver update/hardware upgrade/etc and contact your GPU’s customer support over that.

Or if the problem is a bulky shader with lots of procedurals/bump/etc try to optimize/simplify it by removing subtle things and/or baking textures.

Clément Foucault @fclem13:59

ok is your issue with a scene with lots of shaders or with a special shader in particular?

  • or even with any shader?

  • 3di

3di @3di14:01

mines with a group node that gets fully re-compiled every time I plug something into one of it’s external sockets. Aiming to try and find a way to associate the already compiled glsl with the node, and then if the node remains unchanged, use the existing glsl instead of recompiling. The question is, is the recompilation indicated by the blue progress bar referring to blender’s task of converting the shader graph to glsl, or does it refer to to opengl’s progress of converting that glsl to machine code.

  • brecht thought it was almost purely reporting opengl’s progress of converting from glsl to machine code, but he wasn’t sure.

Clément Foucault @fclem14:04

Blue bar is just shader_compiled/shader_in_queue

  • so nothing about OpenGL status

  • also we already detect if a shader does not change. But we cannot optimize the “only recompile the bits that changed” because it’s bytecode that we don’t even have and could be vastly different for each GPU/vendors.

  • what we can improve tho, is that we could decouple shading from the material evaluation. but this means defered shading and is a big paradigm change frow what we have now.

  • 3di

3di @3di14:07

OK, so the bit were waiting for is shader to glsl and not glsl to machine code.

  • shader graph to glsl rather.

Clément Foucault @fclem14:08

NodeTree > glsl is done async and does not freeze. The GLSL > bytecode is freezing because drivers hangs.

  • So if it hangs for 30sec it’s driver. If you can rotate viewport for 30sec it’s Nodetree and here we could see what to do.

  • 3di

3di @3di14:10

it’s not hanging, its the blue progress bar slowly climbing, so presumably it hasn’t even begun the glsl to bytecode at this point?

  • from what I understand, which might be incorrect. blender converts shader graph to glsl, and this is what the blue progress bar shows. After this driver converts that glsl to bytecode?

Clément Foucault @fclem14:11

the blue bar is just an indication of how many shaders are still not recompiled.

  • 3di

3di @3di14:11

I only have one material

Clément Foucault @fclem14:11

blue bar shows both

  • 3di

3di @3di14:11

Or when you say shader, do you mean each node/group inside that material?

Clément Foucault @fclem14:11

if blue bar is not doing 0 > 100% instantly you don’t have only 1 material

  • 3di

3di @3di14:12

I’ll make a video

  • :+1: 1

  • one sec.

  • just one material with a group.

  • 3di

3di @3di14:30

dropbox.com/s/tux9bnenn9kgzcn/2021-04-19 14-24-26.mp4?dl=0

  • I’m getting around 12 seconds per connection here.

  • swapping blend mode to alpha blend in the settings increases shader compilation to around 24

Clément Foucault @fclem14:36

Looks like it is our nodetree folding system that is taking too long here

  • 3di

3di @3di14:36

my aim was to see if it would be possible to hash the individual nodes/groups, and when the glsl is created for that node, the hash would be used as a key for a dictionary of each nodes glsl, and then when recompiling the shader graph to glsl in future, if the hash was the same, it would use the stored glsl to build the full material, before presenting it to opengl for the conversion to bytecode.

Clément Foucault @fclem14:37

Not sure if it is glsl here that is taking the time. Do you compile blender yourself?

  • 3di

3di @3di14:37

yeah

  • i’ve done a basic patch that was approved before.

  • so i’m not completely unfamiliar with the code.

  • but i’m not great

  • I’m not sure why enabling alpha blend would double the compile time either.

Clément Foucault @fclem14:40

maybe in this case it needs multiple shaders for one material.

  • if you can maybe just run in debugger inside VisualStudio and just pause when you are waiting and see where the statck is. OR use the profiler mode.

  • 3di

3di @3di14:42

ok, i’ll do that.

  • thanks.

Clément Foucault @fclem14:42

3di my gut feeling is that it will be inside ntreeGPUMaterialNodes

  • But basically, we flatten the whole tree so if you have nested groups it can increase time quite a bit.

  • But one thing I don’t understand is why we allocate all nodes separately… this is a mystery to me. Might be a low hanging fruit to pick.

  • 3di

3di @3di14:44

Brilliant, thanks. I’ll take a look later this week, it’s been pretty quiet due to covid, and i’ve just had my first client work through for a few weeks, so best crack on with that.

  • :+1: 1
  • :+1:

Is this a copy paste from your private conversation on blender chat? Glad you got your answers but besides copying and pasting chat logs publicly without all party’s knowledge being poor form it kind of seems like threadspam.

thanks, just keeping it all together for future reference. They’re both public domain, I can’t see this as a problem…