Cycles OptiX

MetinSeven · August 4, 2019, 6:55pm

Hi,

I’m very content with the addition of NVIDIA OptiX to Cycles, my thanks to the devs.

Currently using a GraphicAll build featuring OptiX.

I’m wondering about two things:

When can we expect OptiX to be integrated in the daily Master builds?
Will there be a CPU + GPU option in the OptiX section of the Preferences, like in the CUDA section?

Thank you.

BD3D · August 4, 2019, 10:12pm

MetinSeven · August 5, 2019, 7:03am

Thanks! Somehow I didn’t get notified of that reply.

MetinSeven · August 5, 2019, 7:10am

This makes me wonder again: one of the GSOC projects is the implementation of the Embree BVH in Cycles. Will this possibly pave the way to a Cycles CPU + GPU option in the OptiX section of the Blender Preferences?

JuanGea · August 5, 2019, 8:53am

The problem seems to be the type of BVH, Optix uses an specific BVH system, Optix BVH, while with CUDA we have to use BVH2 and with CPU alone we can use BVH2, BVH4 or even BVH8.

So I think the fundamental problem here is how the rendering is implemented.

@brecht now that you are more focused on Cycles, could you consider as a possibility to change the way Cycles makes the renders?

I mean, right now each device renders a tile, so they need to have the BVH in sync and use the same BVH, could it be possible to allow each device to render the whole scene alone and sum up the results afterwards?

So let’s think that we have a render with 12 tiles, and we want to render with CPU and GPU at a quality of 20 samples (this is an oversimplification for explanation), could be possible to start the whole 12 tiles rendering with the two devices separately, each device would render on each pass lets say 5 samples per tile, once both devices have rendered the whole picture we would get 10 full samples, of course the GPU could be 3 times faster than the CPU so in the end in the time the CPU has rendered 5 samples the GPU could be able to render 15 samples, so after 1 CPU pass we got 3 GPU passes and the whole 20 samples image.

This could ease the mixing of devices and systems (like Optix with a completely different BVH) and it will also allow to mix BVH types (BVH8 for a threadripper for example, BVH2 for CUDA or OpenCL and Optix BVH for RTX) and it could ease also other features like noise analysis for adaptive sampling based on passes, etc…

Right now this can be done by handl we can render in three separated Blender instances and mix them afterwards, we may not end with an exact passes amount, I mean, we could get 855 pases when we explicitly defined 850 or 860, that could happen possibly, but it’s a minor issue compared to the benefits of having true GPU + true CPU + true Optix or Hardware RT.

What do you think @brecht

What

brecht · August 5, 2019, 8:53am

It’s not a big deal to make CPU + GPU work for Optix, it doesn’t affect the core Cycles architecture at all. Just something to finish along with the Optix patch. We’ll work on getting Optix into master in the coming months.

JuanGea · August 5, 2019, 9:55am

Maybe, but they must share the same BVH, right? And this affects performance, for example, when we currently render with CPU+GPU both devices have to share te same BVH, right? So we loose some performance for CPU, am I right? (If we must continue this conversation in a different thread just tell me)
Also right now we can do nothing to analyze the whole scene noise level because tasks are singular and complete, I mean we cannot stop render in the middle (programmatically I mean) with all the tiles rendered, analyze the noise and decide where should we apply more sampling and were less, so having this kind of “tiles passes” could possibly ease a lot of things

brecht · August 5, 2019, 10:01am

No, they don’t have to share the BVH.

JuanGea · August 5, 2019, 10:23am

Hum… there is something wrong then in the Embree Gsoc, because he is converting from BVH4 to BVH2 and he says that in case we use CPU+GPU with Embree then we would need to use BVH2 for everything.

It’s good news that you say it’s not needed

And what do you think about the possibility to analyze the render in the middle for things like a more edvanved adaptive sampling? (I may be wrong there too)

brecht · August 5, 2019, 10:28am

If adaptive sampling needs some synchronization between tiles it can be added, similar to what we already do for denoising. We can implement those things when needed for a particular algorithm, but it’s unrelated to BVHs.

JuanGea · August 5, 2019, 11:00am

Good to know, thanks for this!
I’ll start another conversation to ask you another thing about AS, but I don’t want to derail this thread anymore, it has been too much.

MetinSeven · August 5, 2019, 11:59am

No problem, @JuanGea. I’ve started this thread for good questions like yours, and good answers like @brecht provides.

JuanGea · August 5, 2019, 12:42pm

Thanks @MetinSeven
I’ve continued the conversation in this thread:

Just for the shake of avoid users thinking that all that I’m saying about adaptive sampling or other things is directly related to Optix (just in case)

MetinSeven · August 5, 2019, 6:03pm

A little in-between message:

Brecht wrote this about OptiX for Cycles on developer.blender.org:

“Note that I would like this to replace the CUDA backend eventually as the officially supported way to render on NVIDIA GPUs. There’s not much point maintaining multiple backends if we can get to feature parity and support all the same cards.”

It’d be great if OptiX could fully replace CUDA in the near future, making Cycles rendering settings for NVIDIA GPUs clear again, with only one (the best) option.

JuanGea · August 5, 2019, 7:03pm

Yep, the only thing that worries me about that is that Optix may not be supported by 9xx card series, just 10xx series, and the 9xx series still works pretty well, 2x980m with 8Gb each have a bit more speed than a single 1080 with 8Gb

But I’m not sure about what do Optix support

MetinSeven · August 6, 2019, 11:15am

I’ve got another question for @brecht:

In this Blender Artists topic about Cycles OptiX, some posts mention that there’s no difference between rendering with a fast GPU and rendering with CPU + GPU.

I always thought when CPU + GPU is activated, Cycles would utilize the strengths of the CPU and the GPU to distribute rendering across both. E.g. GPU for floating point calculations, CPU for other calculations, both working simultaneously to speed up the rendering process. Hence I thought CPU + GPU would always be faster than only GPU, no matter how fast your GPU is.

Am I wrong about this?

JuanGea · August 6, 2019, 3:34pm

I think I can answer you, at least partially

Cpu and GPU distribute work, but it’s not “part” of the work, it’s divided by tiles, so if you have 8 super slow cores and 1 super fast GPU you will get 9 tiles being rendered at the same time, but you will end up waiting for the CPU tiles to finish while the GPU does all the job.

So CPU+GPU is what is seems to be, CPU rendering some tiles and GPU rendering some tiles, I think that’s the correct way because doing some calculations on the fly on the CPU and sending them to the GPU would be too slow in the end, but of course if your CPU is slow, it will not be worth the speed gain, because you may not have any speed gain.

BD3D · August 6, 2019, 8:17pm

Speaking about cpu+gpu, there’s a lot of people’s in the forums who use high end gpu(s) who finds it slower to use hybrid rendering, even with top of the line threadripper machine.

Edit: nevermind, it was already discussed above.

JuanGea · August 6, 2019, 11:17pm

I use a 1080(we cannot consider it top of the line but it’s not bad) and a 2990WX, you have to use a small tile size, 16x16, and in general I always get better times

BD3D · August 7, 2019, 3:15am

It’s clear that when you compare the two parts together it seems logical that you don’t see your cpu slowing down your gpu

I was more thinking of a reasonable price top of the line cpu then like the 2950x or 1950x. (Although we can agree that is not top of the line anymore).

I consider my 1950x as a really good cpu but still it cannot handle the speed of my 2080ti.

Leaving my 2080ti alone is giving me way better bench, otherwise I always have to wait for thoses last threadripper tiles to finish and in comparison it take just too much time.

Edit: nevermind, it was already discussed above.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Cycles OptiX