I asked Sergey this question privately a few months ago and they gave some advice that I thought would be useful to have published publically.
Encode information as pixels
If you know the area of Cycles that’s having issues, you can capture the information you’re interested in and write it to a data pass.
For example I wrote information about curve intersections to the Normal
data pass in Cycles while investigating a curve rendering bug on the GPU. I used the R, G, and B channels to store different information.
The main way I did this for the curve intersection test was:
- Add a float3 vector to the
ray
structure.
- Write information to the new float3 vector while in the code area I was interested in.
- Then at a higher level write the information out to the data pass.
- Then in the image editor in Blender, I can use the sample tool to get the information written to the pass, or if something is majorly off, I can just see it.
This does take a long time to work through if you only know the area (E.g. curve rendering) that has issues, not the specific functions that are likely to have issues. Because it means you need to keep, adjusting, and recompiling Cycles as you track down the issue.
Side note: Typically when there’s a minor issue that’s different between the GPU and CPU, or even GPU to GPU, it’s precision related. So using the sample tool in the image editor is important here, and you may even need to scale up the values you’re looking at to be able to see the subtle precision differences.
Here’s a code example of doing this so I can visualize BVH and triangle intersection tests: Alaska/blender: Alaska's fork of the Blender repository. - blender - Blender Projects
Printing to terminal
CUDA supports printing information to the terminal through printf()
. So you can add printf
to the area of code you’re interested in, then render the scene with a low resolution and sample count and check what’s printed to the the terminal.
Proper debugging
NVIDIA offers cuda-gdb for debugging CUDA kernels. I haven’t tried it, but it should hopefully work with Cycles CUDA. Although Sergey did note they had some quirks with it.
To use cuda-gdb, it is recommended to compile CUDA kernels in a way that’s nicer for debugging software. This means adding the flags -G -g -O0
to CUDADevice::compile_kernel_get_common_cflags
-G
is device-side debugging info
-g
is the host side debugging info (probably not strictly needed for Cycles)
-O0
is to disable optimizatiosn to make compilation faster, and help reading backtraces
There are probably similar tools for AMD with HIP (ROCgdb), Intel, and Apple.
On the topic of Apple, Xcode has a Capture GPU workload
option that can be used for debugging. I’ve tried it twice, one time it worked (I think I was just lucky then), the other time it gave me misleading results. So there must be more to this tool than simply activating it and then debugging the selected workload.