Profile CPU and memory in debug or release?

I want to profile a few minutes of cycles rendering a benchmark file (I already did so, but I am not sure if I did so correctly). I am particularly interested in two things: Finding out CPU time hotspots and finding out cache hits and misses (general cache friendliness and boundness) (is boundness a word? I am not a native english speaker, but I hope you get what I mean). Now, should I profile in debug or in release configuration? Because somebody suggested to build the release configuration with debug symbols and then profile that, should I do that? How ‘realistic’ is the debug build? Is the memory layout somehow different? Different padding or something? Can I expect cache friendliness optimizations (and maybe also the removal of unnecessary processing) to transfer over to the release build?

I know, that’s a lot of questions, and I guess the answer to some of them is “it depends”, but still, I would like to hear your opinions on them.

Debug builds are not indicative of the actual performance since none of the optimizer stages of the compiler will run. Additionally there are generally many sanity checks that don’t exist in the release builds to help developers spot problems.

For those reasons you’d want to run the profiler on a release build, so you’ll have a realistic performance profile. Problem with release builds however is that they generally lack symbol information, so the profiler cannot figure out to what to what line of code what instruction belongs.

So that won’t work either.

The RelWithDebInfo build configuration will have the best of both worlds performance identical to a release build but with the extra information the profiler will need to figure out what line of code is responsible for what instruction.

That is the right thing to do. You need to profile the code as it is being run in the end, anything else can be very misleading.

1 Like