Better passes code and UI

At the moment, we have:

  • 2 normal passes: old and denoising one. The one from the denoiser is much better and could become the default one, that would be available as a standalone pass like the old one?
  • not only are many other passes double due to the official denoiser, they also have their own code base/style. The new ones are all or nothing, which is not user friendly and also can’t be easily separated from the denoising code. We are still in a phase were big changes can be done, after 2.8 official, it will be again harder. As the denoiser is getting a new UI for animation and other denoiser may be added, it would be the perfect moment to reunify the way passes are coded, activated and used.

I don’t think the passes should be coupled, the use cases are different and we should be able to change each to fit their purpose, not limit things.

Ok, I can understand the code side argument. But for the user side, the passes the official denoiser produces are partly useful for any denoiser or for other purposes in the compositor. At least as a user, it would help a lot to be able to select those individually just like the other ones, instead of getting 8 layers at once, which in some scene make the rendering take twice the time (especially on dual gpu configurations), in many cases more than 20% slower. Sometime, it’s even slower with just the denoising data alone than with the full denoising. It also requires more memory.

I don’t think the performance overhead of computing denoising passes would be significantly reduced by computing only a subset of them. The main cost is likely the prefiltering which would have to happen even when computing just 1 pass.

That confirms my first attempts. Then I’ll have to find another way. Thanks for the quick answer.

The performance is indeed due to the prefiltering and is not dependant on the number of passes. I deactivated all passes but normal, still the rendering is about 1.8x slower using 2 gpus (nearly as slow as with 1 gpu) compared to not activating the denoising data passes.

From left to right: 1gpu with normal denoising data pass, 2 GPU with normal denoising data pass, 2GPU without denoising passes
The result is about the same with all the data passes activated like in master. Sometime, it’s even slower with 2 gpus than with 1.
Looks like their is a lot of idling, so it’s not really the performance penalty of prefiltering itself. Is it really mendatory to have so much idling although each tile is pretty independent, so should be safe to access any time? Even if prefiltering would need neighboor tiles, I don’t see why it would lock rendering a new tile which can not be a neighboor tile?

I just noticed it only happens on self made builds (on vanilla Blender2.7 branch, call “make.bat release 2017” on windows). The buildbots don’t have the bug.

  • libs are the one from svn,
  • not a single line of code was changed.
  • I also use CUDA 9.1 for all arch up to sm_61.
  • I tried both 15.7 and 15.9 VS

When rendering the bmw scene at 200%, tile size 32x32, 25spp , both GPUs are at 30-40% load and the render time goes from 17sec without denoising data to 48sec with using 2 gpus. On buildbots, it goes only from 17 to 20sec.

@LazyDodo any Idea ?

No idea where that difference could be. You could try copying the CUDA binaries to see if it’s related to that or something else. And are you sure this is consistently repeatable or is there some randomness?

Actually, I took an old buildbot without noticing. The bug is also in the latest buildbots. As far as I can tell, it started with
Open the file from pasteall in my last post, hit F12 with 2 gpus and monitor gpu usage with msi afterburner.
There is a bit of randomness, but still most of the time the load is between 30 and 40% with window minimized/console render and around 15% with render window visible.

Before that build it would not prefilter the denoising passes at all, so it’s not that surprising. Probably some optimizations could be done to speed up prefilter/denoise with multiple GPUs.

The problem here is not really that prefiltering takes so much time. It’s more that it seems to prevent the other GPU from working and it seems to happen mostly with GPUs without a display.
Manually setting the step_sample value in device_cuda to 1 000 000 kind of simulate cards without display and makes the bug more obvious in multi-gpu setups.
With this change, as long as all GPUs do pure path tracing, they all work 95-100%, at the moment one of them starts to prefilter, it seems the other one finish path tracing their tile and then idle until the prefiltering is finished. Then another GPU enters prefiltering and the same happen: the other ones finish their tile and idle until the prefiltering tile is finished, as if they couldn’t acquire a new tile to work on ?

I simply kicked the new animation denoising for now, but I think it would also help vanilla Blender to have it work properly as the bug is also in buildbots.