This one goes out to Brecht and other Cycles devs.
Ive been playing with adding the nvidia AI denoiser to cycles for a fun little learning process, but it would seem that the ai denoiser wants a full render frame of samples to denoise (maybe tile’s with neighbours would work in long run but one thing at a time)
Now one thing that’s always annoyed me about cycles rendering is having to wait for each render tile to finish it’s sample count number before moving on to the next render tile. so say your rendering at 10,000 samples it could take hours to complete even a few tiles. which means you have no idea how the final image is going to look until the very last minute, if a material doesn’t look right or something in the scene’s amiss you only can tell this after the whole rendering process, which means having to restart the render all over again.
What I would like to do is add a render sample passes control, so idea is you could set the samples count to 128, and samples passes to 10. Each tile would render 128 samples before moving to the next tile, when the last tile has rendered the samples pass moves to 2, and we start to render another 128 samples from the very first tile again, this would continue until all 10 passes have completed giving you your final sample count of 1280 .
The good thing about this is to get a complete final render at the lowest sample count would be quick and allow us to see the whole scene in the render to check everything is ok, it also means that once the samples pass reaches the end off pass 1 I could call the ai denoiser on the whole frame, whenever the current pass completes the denoiser could be called, e.g end of pas 1 call ai denoise end of pass 2 call ai denoise.
Adding the passes idea could also be handy in the future for doing adaptive sampling, another benefit would come from mixing cpu and gpu rendering.
At present I have to use larger till sizes with gpu render to achieve it’s most efficient rendering performance, but the cpu wants tiny 32 or 16 tiles for best performance. The current mismatch of best tile size for cpu to gpu means a lot of the time at the end of the frame render your waiting on the cpu tile to finish, but if the tiles had much smaller sample counts per pass, there would be next to no slowdown waiting for the larger cpu tiles to finish. (even better here would be to add a divider to the cpu tile size based of gpu tile size e.g 400x400 for gpu with cpu divider of 10 = cpu tile size of 40x40)
At the end of the day if people wanted to just keep the rendering the same as now all you would need to do is add your final sample count like now and set passes to 0, that would mean no interference with the builtin denoiser setup.
would the devs consider adding this?