Are there any good guides on how to debug Cycles GPU?

Occasionally GPU only bugs appear in Cycles and investigating them can be hard without proper debugging tools. From a quick search online, I haven’t been able to find any useful information on how to debug the GPU backends in Cycles, and my attempts to adapt guides on how to debug GPU backends in general hasn’t been particularly useful.

So I’ve come here to ask if anyone has any good guides on how to debug the GPU backends for Cycles?

Note 1: I’m personally interested in debugging CUDA/OptiX on Windows, and Metal Cycles on macOS. However guides for other platforms and/or backends would be useful.
Note 2: When I refer to debugging, I’m mostly concerned about having the ability to place a break point and inspecting the values at that breakpoint.

4 Likes

I asked Sergey this question privately a few months ago and they gave some advice that I thought would be useful to have published publically.


Encode information as pixels

If you know the area of Cycles that’s having issues, you can capture the information you’re interested in and write it to a data pass.

For example I wrote information about curve intersections to the Normal data pass in Cycles while investigating a curve rendering bug on the GPU. I used the R, G, and B channels to store different information.

The main way I did this for the curve intersection test was:

  • Add a float3 vector to the ray structure.
  • Write information to the new float3 vector while in the code area I was interested in.
  • Then at a higher level write the information out to the data pass.
  • Then in the image editor in Blender, I can use the sample tool to get the information written to the pass, or if something is majorly off, I can just see it.

This does take a long time to work through if you only know the area (E.g. curve rendering) that has issues, not the specific functions that are likely to have issues. Because it means you need to keep, adjusting, and recompiling Cycles as you track down the issue.

Side note: Typically when there’s a minor issue that’s different between the GPU and CPU, or even GPU to GPU, it’s precision related. So using the sample tool in the image editor is important here, and you may even need to scale up the values you’re looking at to be able to see the subtle precision differences.

Here’s a code example of doing this so I can visualize BVH and triangle intersection tests: Alaska/blender: Alaska's fork of the Blender repository. - blender - Blender Projects


Printing to terminal

CUDA supports printing information to the terminal through printf(). So you can add printf to the area of code you’re interested in, then render the scene with a low resolution and sample count and check what’s printed to the the terminal.


Proper debugging

NVIDIA offers cuda-gdb for debugging CUDA kernels. I haven’t tried it, but it should hopefully work with Cycles CUDA. Although Sergey did note they had some quirks with it.

To use cuda-gdb, it is recommended to compile CUDA kernels in a way that’s nicer for debugging software. This means adding the flags -G -g -O0 to CUDADevice::compile_kernel_get_common_cflags
-G is device-side debugging info
-g is the host side debugging info (probably not strictly needed for Cycles)
-O0 is to disable optimizatiosn to make compilation faster, and help reading backtraces

There are probably similar tools for AMD with HIP (ROCgdb), Intel, and Apple.

On the topic of Apple, Xcode has a Capture GPU workload option that can be used for debugging. I’ve tried it twice, one time it worked (I think I was just lucky then), the other time it gave me misleading results. So there must be more to this tool than simply activating it and then debugging the selected workload.

5 Likes