Compiling kernel_shader_raytrace in cycles does not include the Ambient Occlusion code

My project is using kernel_shader_raytrace.ptx with optix 7.3.0 and cycles 3.3.0. We’re trying to use the ambient occlusion node but when loading the ptx file and resolving the nodes the call to optixProgramGroupCreate says it cannot find the __direct_callable__svm_node_ao function in the ptx file and upon searching the ptx file the ambient occlusion code is not in the file. Ambient occlusion does work in cuda for us.

Here are our flags for nvcc
$(ProjectDir)…\CUDA11.1\win\bin\nvcc.exe -m64 --relocatable-device-code=true --device-as-default-execution-space --ptxas-options="-v" --generate-line-info --use_fast_math -DNVCC -DPOSER_CYCLES=1 -D__KERNEL_EXPERIMENTAL__ -D CCL_NAMESPACE_BEGIN= -D CCL_NAMESPACE_END= -I $(ProjectDir)…\cycles/src/kernel/… -I $(ProjectDir)…\cycles/src/kernel/device/cuda -I"$(ProjectDir)…\Optix SDK 7.3.0\include" -gencode=arch=compute_75,code=compute_75 -o $(OutDir)kernel_optix.ptx --ptx %(FullPath))

I can see it on my stock build, did you make modifications?

Nothing that effects the ambient occlusion. we do have some small changes which I removed as a test with no luck. Could it be we’re compiling for a too low architecture?

no the optix target always targets sm7550 for maximum compatibility

In the cycles project the nvcc compile uses -arch=sm_50 as the architecture.

These flags look very different from how Cycles OptiX kernels are built by default.

We don’t use --relocatable-device-code=true, --device-as-default-execution-space, and build ptx with sm_50 as a base level instead of one specialized for a specific new architecture like compute_75.

So there must be some pretty significant changes?

In fact are these flags even generating ptx code correctly? I think the output might be a combination of ptx and cubin assembled for compute_75. And I would guess OptiX is expecting only ptx.

--device-as-default-execution-space is still used in which i think we can remove now we haven’t needed it in years.

--relocatable-device-code=true is used by the new OSL kernels

but i’m unsure how both those flags together could end up on the regular optix kernel

Best way forward is likely posting a full patch with your modifications

Ok I got it to work. Yes it was a flags issue. I elminated my architecture flags and put in
nvcc.exe -m64 --ptxas-options="-v" -Wno-deprecated-gpu-targets --use_fast_math -DNVCC -DPOSER_CYCLES=1 -D__KERNEL_EXPERIMENTAL__ -D CCL_NAMESPACE_BEGIN= -D CCL_NAMESPACE_END= -I $(ProjectDir)…\cycles/src/kernel/… -I $(ProjectDir)…\cycles/src/kernel/device/cuda -I"$(ProjectDir)…\Optix SDK 7.3.0\include" -arch=sm_50 --keep-device-functions -o $(OutDir)kernel_optix.ptx --ptx %(FullPath))

I see D__KERNEL_EXPERIMENTAL__ is depreciated so I will remove that.

thanks for the help. Consider this resolved unless I spoke too soon and it comes back to bite me.