Adding Render passes to render samples for multi pass rendering


#1

This one goes out to Brecht and other Cycles devs.

Ive been playing with adding the nvidia AI denoiser to cycles for a fun little learning process, but it would seem that the ai denoiser wants a full render frame of samples to denoise (maybe tile’s with neighbours would work in long run but one thing at a time)

Now one thing that’s always annoyed me about cycles rendering is having to wait for each render tile to finish it’s sample count number before moving on to the next render tile. so say your rendering at 10,000 samples it could take hours to complete even a few tiles. which means you have no idea how the final image is going to look until the very last minute, if a material doesn’t look right or something in the scene’s amiss you only can tell this after the whole rendering process, which means having to restart the render all over again.

What I would like to do is add a render sample passes control, so idea is you could set the samples count to 128, and samples passes to 10. Each tile would render 128 samples before moving to the next tile, when the last tile has rendered the samples pass moves to 2, and we start to render another 128 samples from the very first tile again, this would continue until all 10 passes have completed giving you your final sample count of 1280 .

The good thing about this is to get a complete final render at the lowest sample count would be quick and allow us to see the whole scene in the render to check everything is ok, it also means that once the samples pass reaches the end off pass 1 I could call the ai denoiser on the whole frame, whenever the current pass completes the denoiser could be called, e.g end of pas 1 call ai denoise end of pass 2 call ai denoise.

Adding the passes idea could also be handy in the future for doing adaptive sampling, another benefit would come from mixing cpu and gpu rendering.

At present I have to use larger till sizes with gpu render to achieve it’s most efficient rendering performance, but the cpu wants tiny 32 or 16 tiles for best performance. The current mismatch of best tile size for cpu to gpu means a lot of the time at the end of the frame render your waiting on the cpu tile to finish, but if the tiles had much smaller sample counts per pass, there would be next to no slowdown waiting for the larger cpu tiles to finish. (even better here would be to add a divider to the cpu tile size based of gpu tile size e.g 400x400 for gpu with cpu divider of 10 = cpu tile size of 40x40)

At the end of the day if people wanted to just keep the rendering the same as now all you would need to do is add your final sample count like now and set passes to 0, that would mean no interference with the builtin denoiser setup.

would the devs consider adding this?
Cheers James


#2

I thought about the same thing a couple of times:)

The only problem I see is the fact that you would need to keep the entire render in memory. But what do I know, maybe it doesn’t need to be necessarily kept on the GPU (I think it doesn’t if save buffers is enabled) and can be optimized somehow in this way.

Also it seems to me there could be some noise level detection that can lower the sample count per tile according to amount of noise it has over the course of it’s rendering. So some tiles finish at 147 while others with lots of complex lighting geometry and materials finish at max sample count chosen of let’s say 500. But here I am sure some kind of trickeyry needs to be used to keep the denoiser from showing visible grid of the tiles at places.


#3

We have the Progressive Refine option for this purpose, but it could be improved a lot to avoid performance loss and reduce memory usage.

I don’t know if it really needs specific user control, perhaps it can be automated. Progressive refine currently always renders 1 sample at a time. Instead it could for example increase the number of samples it renders incrementally, as higher samples give less visual difference anyway. For example 1, 1, 2, 4, 8, 16, 16, 16, … .

Regarding tile sizes, I think it’s best if we have a small fixed tile size, but then let GPUs render multiple small times at the same time, depending on the number of cores it has.


#4

Yeah the reason I thought this would be better than progressive refine is for this reason, progressive refine is slowwwww and kills memory (Blender just crash’s for me most times I use progressive refine) but at least using a multi pass tile render setup ill still get the full render image but keep the speed and memory advantage of tiled rendering, allowing people with cards with small memory counts to still use full frame denoising.

I like what your thinking about tile size and assigning multiple tiles to one device, but how difficult would it be to add and do you have plans on maybe adding this anytime soon.

Part of why I wanted to add optix to blender is it’s ability to use CNN’s like the Nvidia ai denoiser, but also been looking at how 2 cnn’s could be used on a complete render frame to do adaptive sampling, came across a cool paper on that: http://nkhademi.com/Data/EGSR18_Sampling.pdf

This was more aimed at low sample rates but the more I looked at it would also be good for higher sample rates, using GANS here could also be helpfull to train an AI to recognise according to scene and models lighting setups where extra sampling would be needed based on past learnt experience.

But I like the idea of creating many AI’s that we could then just run through the optix postprocess system, Like super resolution & AA, denoiser that you can train (like optix ai denoiser) and feed back in custom training files, like an AI denoiser for NPR rendering, Arch viz Ai denoiser etc. Lots of cool things that could be done


#5

Would be cool if cycles would render each tile as 16 samples or 32 samples. and move to next… till it hits noise lvl or max sample amount. i am asume that denoiseing between each pass would be slowing stuff down a lot.