I love Cycles, absolutely love it.
I really love that the Blender developers have in the latest version of Blender, made it possible to render with both CPU and GPU at the same time, it has the potential to edge out a bit more speed from a PC and take advantage of a CPU that is otherwise just sitting idle throughout the whole render process.
When you’re rendering lengthy scenes with Cycles, you’ll take every bit of extra speed you can get!
But so far I haven’t used this feature for anything else other than performance tests… because rendering with GPU + CPU is actually slower than rendering with just my GPUs.
Yes really, rendering with both my GPUs + my CPU is slower than just rendering with my GPUs.
For context, I have a pair of GTX 1080 Ti’s, and an AMD Ryzen 7 1800X.
It doesn’t matter what type of scene I’m rendering, I’ve tested this numerous times with different scenes, every time the result is the same, if I only render with my GPU, I get a faster render time than if I render with GPU + CPU. Tile size and sample count appear to make little or no difference.
The reason why is pretty obvious when you watch the image rendering.
The GPUs absolutely blaze past the CPU threads in speed when rendering an individual tile. What takes the GPUs perhaps 2 seconds to render, usually takes around 30 seconds for my CPU threads to finish.
That’s pretty logical. 1 thread of an 8 core CPU is always going to be much slower than an entire GTX 1080 Ti when applied to the same size task.
When the image is almost finished, there’s usually several tiles still rendering, all CPU tiles. The GPU can no longer assist with the rendering, because there’s no more tiles left for it to render. So instead the GPU has to sit idle waiting for the CPU to finish rendering it’s tiles. And sometimes many of the CPU’s cores are sitting idle as well while waiting for 3 or 4 tiles to finish rendering.
It defies logic, but rendering with both CPU and GPU combined is slower as a result. What should be, indisputable, universally, always faster is actually slower due to an imbalance in how work load is distributed.
Suggestions for Fixing This
The problem is an imbalance in how workload is distributed, a single CPU thread can’t compete with a GPU, so the CPU threads need smaller workloads.
An option to have all the CPU threads used together (acting almost like a GPU) for a single tile, would allow the CPU to focus on one tile instead of a dozen and finish that tile quickly instead of taking a long time to finish many tiles. This way if the GPU(s) finishes it’s tile(s) first, it spends less time idle waiting for the CPU to finish.
Or, alternative solution, an option for variable tile sizes per CUDA device would help, with the GPU able to render large tile sizes, say 128px or 256px, and the CPU threads rendering much smaller tiles, say only 16px/32px.