I am trying to benchmark Cycles performance without SSE, with SSE only, and with AVX instructions. I am finding weird behavior where if I switch off the “WITH_CPU_SSE” flag in CMakeLists.txt, the program does not render the full image. I tried this using the demo file for the Cosmos Laundromat Demo. With SSE off, it only renders a small portion of the sky. Not sure why this is happening?
Further, what is the easiest way to build blender/cycles without SSE, with SSE only, and with AVX instructions? Has anyone done performance benchmarking to see what performance impact these make?
You can’t really turn off sse2 off anymore, since we have dropped support for 32 bit, most compilers count on sse2 will be available for all 64 bit targets (msvc doesn’t even allow you to turn it off)
Best way to compare kernels between the various architectures is just build all of them as normal and disable the ones you don’t want at run-time.
start up blender normally.
Hit F3, and search for the Debug menu
As debug value enter 256
in the props window you should now have an extra panel
There are no virtual functions. We compile the kernel for the different architectures separately with different signatures and then it’s a simple if/switch when launching the kernel.
Thanks. Is there any way to set the BVH-arity (i.e. set to BVH2, BVH4, etc.) and turn-off AVX from the command line?
The above solution mentions that we can open up a Debug menu in the GUI and set these things, however, I was wondering if we are doing a command line render, how do we set these parameters?
you can use the following environment variables to control this behavior:
CYCLES_CPU_NO_AVX2 - if defined disable AVX2 support
CYCLES_CPU_NO_AVX - if defined disable AVX support
CYCLES_CPU_NO_SSE41 - if defined disable SSE41 support
CYCLES_CPU_NO_SSE3 - if defined disable SSE3 support
CYCLES_CPU_NO_SSE2 - if defined disable SSE2 support
CYCLES_BVH2 - if defined use BVH2
CYCLES_BVH4 - if defined use BVH4
CYCLES_BVH8 - if defined use BVH8
if multiple of the BVH variables are defined the lowest one will be used.
Thanks much. If I can ask a quick follow up question -
What are the environment variables to control the number of threads and tile size?
Appreciate your help!