Cycles raises a CUDA error while running on IBM-Power9 with NVIDIA V100

Hi,
we are compiling blender 2.82a in a IBM Power9 cluster that has nvidia V100 GPU’s and while we managed to have it running using the CPU’s, we get a CUDA error while using the GPUs:

CUDA error: Invalid value in cuMemcpyHtoD( cuda_device_ptr(mem.device_pointer), mem.host_pointer, mem.memory_size()), line 917

Refer to the Cycles GPU rendering documentation for possible solutions:
https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html

Might someone gives some light about what is the issue here?

Many thanks in advance.

First step with be to run Blender with --debug-cycles to get debugging output.

Thank you brecht.
If I understand what happens it seems that it is failing to allocate 2.4 kbytes into the device.
Any clue what might be happening here?

I0317 12:42:46.263566 113433 blender_python.cpp:184] Debug flags initialized to:
CPU flags:
AVX2 : True
AVX : True
SSE4.1 : True
SSE3 : True
SSE2 : True
BVH layout : BVH8
Split : False
CUDA flags:
Adaptive Compile : False
OptiX flags:
CUDA streams : 1
OpenCL flags:
Device type : ALL
Debug : False
Memory limit : 0
I0317 12:42:46.296241 113433 device_cuda.cpp:2582] CUEW initialization succeeded
I0317 12:42:46.296278 113433 device_cuda.cpp:2584] Found precompiled kernels
I0317 12:42:46.324736 113433 device_cuda.cpp:2708] Device has compute preemption or is not used for display.
I0317 12:42:46.324769 113433 device_cuda.cpp:2711] Added device “Tesla V100-SXM2-16GB” with id “CUDA_Tesla V100-SXM2-16GB_0004:04:00”.
I0317 12:42:46.333961 113433 device_opencl.cpp:48] CLEW initialization succeeded.
I0317 12:42:46.382089 113433 opencl_util.cpp:900] Enumerating devices for platform NVIDIA CUDA.
I0317 12:42:46.382184 113433 opencl_util.cpp:950] Ignoring device Tesla V100-SXM2-16GB, not officially supported yet.
I0317 12:42:46.394541 113433 util_task.cpp:329] Creating pool of 80 threads.
I0317 12:42:46.394769 113433 util_task.cpp:241] Detected 80 processors in active group.
I0317 12:42:46.394783 113433 util_task.cpp:251] Not setting thread group affinity.
I0317 12:42:46.396123 113433 device_cuda.cpp:700] Mapped host memory limit set to 608,065,945,600 bytes. (566.30G)
I0317 12:42:46.472574 113761 session.cpp:751] Requested features:
Experimental features: Off
Max nodes group: 2
Nodes features: 0
Use Hair: False
Use Object Motion: False
Use Camera Motion: False
Use Baking: False
Use Subsurface: True
Use Volume: False
Use Branched Integrator: False
Use Patch Evaluation: False
Use Transparent Shadows: False
Use Principled BSDF: True
Use Denoising: True
Use Displacement: False
Use Background Light: True
I0317 12:42:46.472622 113761 device_cuda.cpp:453] Testing for pre-compiled kernel /gpfs/apps/POWER9/BLENDER/2.82a/2.82/scripts/addons/cycles/lib/kernel_sm_70.cubin.
I0317 12:42:46.475236 113761 device_cuda.cpp:455] Using precompiled kernel.
I0317 12:42:46.475250 113761 device_cuda.cpp:453] Testing for pre-compiled kernel /gpfs/apps/POWER9/BLENDER/2.82a/2.82/scripts/addons/cycles/lib/filter_sm_70.cubin.
I0317 12:42:46.476374 113761 device_cuda.cpp:455] Using precompiled kernel.
I0317 12:42:46.576314 113761 device_cuda.cpp:657] Local memory reserved 3,131,047,936 bytes. (2.92G)
I0317 12:42:46.576356 113761 session.cpp:764] Total time spent loading kernels: 0.10379
I0317 12:42:46.576476 113761 svm.cpp:81] Total 10 shaders.
I0317 12:42:46.576534 113761 constant_fold.cpp:132] Discarding closure emission.
I0317 12:42:46.576562 113761 svm.cpp:66] Compilation summary:
Shader name: default_light
Number of SVM nodes: 3
Peak stack usage: 0
Time (in seconds):
Finalize: 0.000027
Surface: 0.000003
Bump: 0.000000
Volume: 0.000001
Displacement: 0.000002
Generate: 0.000006
Total: 0.000034
I0317 12:42:46.576603 113761 svm.cpp:66] Compilation summary:
Shader name: shader
Number of SVM nodes: 5
Peak stack usage: 0
Time (in seconds):
Finalize: 0.000005
Surface: 0.000003
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000002
Generate: 0.000007
Total: 0.000013
I0317 12:42:46.576637 113697 svm.cpp:66] Compilation summary:
Shader name: default_empty
Number of SVM nodes: 3
Peak stack usage: 0
Time (in seconds):
Finalize: 0.000004
Surface: 0.000002
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000002
Generate: 0.000006
Total: 0.000010
I0317 12:42:46.576625 113690 svm.cpp:66] Compilation summary:
Shader name: default_surface
Number of SVM nodes: 8
Peak stack usage: 4
Time (in seconds):
Finalize: 0.000014
Surface: 0.000012
Bump: 0.000000
Volume: 0.000003
Displacement: 0.000005
Generate: 0.000020
Total: 0.000035
I0317 12:42:46.576643 113692 svm.cpp:66] Compilation summary:
Shader name: default_background
Number of SVM nodes: 5
Peak stack usage: 0
Time (in seconds):
Finalize: 0.000007
Surface: 0.000004
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000002
Generate: 0.000008
Total: 0.000015
I0317 12:42:46.577527 113694 svm.cpp:66] Compilation summary:
Shader name: shader
Number of SVM nodes: 5
Peak stack usage: 0
Time (in seconds):
Finalize: 0.000029
Surface: 0.000004
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000001
Generate: 0.000007
Total: 0.000053
I0317 12:42:46.578485 113691 svm.cpp:66] Compilation summary:
Shader name: Material.002
Number of SVM nodes: 28
Peak stack usage: 23
Time (in seconds):
Finalize: 0.000001
Surface: 0.000015
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000002
Generate: 0.000019
Total: 0.000020
I0317 12:42:46.579267 113687 svm.cpp:66] Compilation summary:
Shader name: Material.004
Number of SVM nodes: 32
Peak stack usage: 26
Time (in seconds):
Finalize: 0.000001
Surface: 0.000018
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000002
Generate: 0.000022
Total: 0.000024
I0317 12:42:46.579336 113695 svm.cpp:66] Compilation summary:
Shader name: Material.003
Number of SVM nodes: 28
Peak stack usage: 23
Time (in seconds):
Finalize: 0.000001
Surface: 0.000017
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000002
Generate: 0.000021
Total: 0.000022
I0317 12:42:46.579336 113688 svm.cpp:66] Compilation summary:
Shader name: Material.001
Number of SVM nodes: 28
Peak stack usage: 23
Time (in seconds):
Finalize: 0.000001
Surface: 0.000017
Bump: 0.000000
Volume: 0.000002
Displacement: 0.000002
Generate: 0.000021
Total: 0.000023
I0317 12:42:46.579744 113761 device_cuda.cpp:859] Buffer allocate: __svm_nodes, 2,480 bytes. (2.42K) in device memory
CUDA error: Invalid value in cuMemcpyHtoD( cuda_device_ptr(mem.device_pointer), mem.host_pointer, mem.memory_size()), line 917
Refer to the Cycles GPU rendering documentation for possible solutions:
https://docs.blender.org/manual/en/latest/render/cycles/gpu_rendering.html
CUDA error: Invalid value in cuMemcpyHtoD(cumem, (void *)&ptr, cubytes), line 1108
I0317 12:42:46.579828 113761 device_cuda.cpp:859] Buffer allocate: __shaders, 320 bytes. (320) in device memory
CUDA error: Invalid value in cuMemcpyHtoD( cuda_device_ptr(mem.device_pointer), mem.host_pointer, mem.memory_size()), line 917
CUDA error: Invalid value in cuMemcpyHtoD(cumem, (void *)&ptr, cubytes), line 1108
I0317 12:42:46.633671 113761 svm.cpp:161] Shader manager updated 10 shaders in 0.0571902 seconds.
I0317 12:42:46.633913 113433 blender_session.cpp:587] Total render time: 0.135591
I0317 12:42:46.633929 113433 blender_session.cpp:588] Render time (without synchronization): -0.10362
I0317 12:42:46.670428 113433 util_task.cpp:347] De-initializing thread pool of task scheduler.
Traceback (most recent call last):
File “/home/bsc32/bsc32870/TestBlenderInPower9/render.py”, line 7, in
bpy.ops.render.render( write_still=True )
File “/gpfs/apps/POWER9/BLENDER/2.82a/2.82/scripts/modules/bpy/ops.py”, line 201, in call
ret = op_call(self.idname_py(), None, kw)
RuntimeError: Error: CUDA error: Invalid value in cuMemcpyHtoD( cuda_device_ptr(mem.device_pointer), mem.host_pointer, mem.memory_size()), line 917