I have developed a remote rendering system to perform rendering on cloud infrastructure. The heart of the system is Blender, taking advantage of the command line rendering capability.
I have a multi-GPU system with 4 nVdidia A10 graphics cards, CUDA 11.7 fully setup and the latest drivers. I perform the rendering in a docker image where Blender 3.3.1 is installed at /usr/local/blender. The script that’s running inside calls the command line:
/usr/local/blender/blender -b blendFile.blend -o imgFilename# -f frame -- --cycles-device=CUDA
If I run this docker passing only a single GPU into the image, it runs perfectly and outputs a rendered image. If I pass it multiple GPUs though, the rendering appears to run, but when Blender goes to save the output file I get an error:
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
I’ve verified that other CUDA workloads work with multiple GPUs both on the base machine and inside the docker image. This is the only things that’s breaking.
I ran the command with --debug-cycles enabled and the full output is below:
I1011 14:58:34.150018 62 device.cpp:32] CUEW initialization succeeded
I1011 14:58:34.150056 62 device.cpp:34] Found precompiled kernels
I1011 14:58:43.730614 62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730638 62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:17:00".
I1011 14:58:43.730715 62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730723 62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:31:00".
I1011 14:58:43.730799 62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730806 62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:b1:00".
I1011 14:58:43.730883 62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730890 62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:ca:00".
I1011 14:58:43.730973 62 task.cpp:73] Overriding number of TBB threads to 4.
I1011 14:58:43.730994 62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:43.814178 62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:43.897723 62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:43.980720 62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:44.064992 62 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.098506 62 sync.cpp:288] Total time spent synchronizing data: 0.032656
I1011 14:58:44.253284 70 colorspace.cpp:140] Colorspace sRGB is sRGB
I1011 14:58:44.336920 62 colorspace.cpp:135] Colorspace Non-Color is no-op
I1011 14:58:44.476052 80 film.cpp:584] Effective scene passes:
I1011 14:58:44.476081 80 film.cpp:586] - type: combined, name: "Combined", mode: DENOISED, is_written: True
I1011 14:58:44.476094 80 film.cpp:586] - type: combined, name: "Noisy Image", mode: NOISY, is_written: True
I1011 14:58:44.476100 80 film.cpp:586] - type: adaptive_aux_buffer, name: "", mode: NOISY, is_written: True
I1011 14:58:44.476109 80 film.cpp:586] - type: denoising_normal, name: "Denoising Normal", mode: NOISY, is_written: True
I1011 14:58:44.476114 80 film.cpp:586] - type: denoising_albedo, name: "Denoising Albedo", mode: NOISY, is_written: True
I1011 14:58:44.476120 80 film.cpp:586] - type: depth, name: "Depth", mode: NOISY, is_written: True
I1011 14:58:44.476127 80 film.cpp:586] - type: sample_count, name: "", mode: NOISY, is_written: True
I1011 14:58:44.476135 80 film.cpp:586] - type: denoising_depth, name: "Denoising Depth", mode: NOISY, is_written: True
I1011 14:58:44.476217 80 device.cpp:39] OptiX initialization failed with error code 7804
I1011 14:58:44.476245 80 scene.cpp:591] Requested features:
I1011 14:58:44.476253 80 scene.cpp:592] Use BSDF True
I1011 14:58:44.476259 80 scene.cpp:593] Use Principled BSDF True
I1011 14:58:44.476266 80 scene.cpp:595] Use Emission True
I1011 14:58:44.476274 80 scene.cpp:597] Use Volume False
I1011 14:58:44.476279 80 scene.cpp:598] Use Bump True
I1011 14:58:44.476286 80 scene.cpp:599] Use Voronoi False
I1011 14:58:44.476294 80 scene.cpp:601] Use Shader Raytrace False
I1011 14:58:44.476300 80 scene.cpp:603] Use MNEEFalse
I1011 14:58:44.476306 80 scene.cpp:604] Use Transparent False
I1011 14:58:44.476313 80 scene.cpp:606] Use Denoising True
I1011 14:58:44.476320 80 scene.cpp:607] Use Path Tracing True
I1011 14:58:44.476326 80 scene.cpp:609] Use Hair False
I1011 14:58:44.476333 80 scene.cpp:610] Use Pointclouds False
I1011 14:58:44.476339 80 scene.cpp:612] Use Object Motion False
I1011 14:58:44.476346 80 scene.cpp:614] Use Camera Motion False
I1011 14:58:44.476353 80 scene.cpp:616] Use Baking False
I1011 14:58:44.476359 80 scene.cpp:617] Use Subsurface False
I1011 14:58:44.476366 80 scene.cpp:618] Use Volume False
I1011 14:58:44.476373 80 scene.cpp:619] Use Patch Evaluation False
I1011 14:58:44.476380 80 scene.cpp:621] Use Shadow Catcher False
I1011 14:58:44.476394 80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.476410 80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.512446 80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.512593 80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.512616 80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.546254 80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.546386 80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.546406 80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.577816 80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.577971 80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.577993 80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.608696 80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.608881 80 svm.cpp:73] Total 12 shaders.
I1011 14:58:44.643329 80 svm.cpp:149] Shader manager updated 12 shaders in 0.034452 seconds.
I1011 14:58:44.643402 80 object.cpp:691] Total 1 objects.
I1011 14:58:44.643559 80 particles.cpp:108] Total 0 particle systems.
I1011 14:58:44.643643 80 geometry.cpp:1805] Total 1 meshes.
I1011 14:58:44.645526 80 geometry.cpp:1294] Using BVH2 layout.
I1011 14:58:44.653132 80 tables.cpp:37] Total 1 lookup tables.
I1011 14:58:44.653322 80 light.cpp:982] Total 6 lights.
I1011 14:58:44.653332 80 light.cpp:219] Background MIS has been disabled.
I1011 14:58:44.653342 80 light.cpp:961] Number of lights sent to the device: 5
I1011 14:58:44.653349 80 light.cpp:963] Number of lights without contribution: 1
I1011 14:58:44.653429 80 light.cpp:314] Total 5 of light distribution primitives.
I1011 14:58:44.653873 80 tables.cpp:37] Total 2 lookup tables.
I1011 14:58:44.654079 80 scene.cpp:378] System memory statistics after full device sync:
Usage: 346,245,288 (330.21M)
Peak: 360,568,064 (343.86M)
I1011 14:58:44.654101 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.678746 80 path_trace.cpp:387] Rendered 1 samples in 0.002285 seconds (0.002285 seconds per sample), occupancy: 0.00560104
I1011 14:58:44.678809 70 path_trace.cpp:387] Rendered 1 samples in 0.00233698 seconds (0.00233698 seconds per sample), occupancy: 0.00563153
I1011 14:58:44.680454 69 path_trace.cpp:387] Rendered 1 samples in 0.00399113 seconds (0.00399113 seconds per sample), occupancy: 0.0138888
I1011 14:58:44.680626 71 path_trace.cpp:387] Rendered 1 samples in 0.00416303 seconds (0.00416303 seconds per sample), occupancy: 0.0134319
I1011 14:58:44.680694 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.719704 80 path_trace.cpp:387] Rendered 31 samples in 0.015553 seconds (0.00050171 seconds per sample), occupancy: 0.137327
I1011 14:58:44.720160 71 path_trace.cpp:387] Rendered 31 samples in 0.0160031 seconds (0.00051623 seconds per sample), occupancy: 0.124251
I1011 14:58:44.758633 69 path_trace.cpp:387] Rendered 31 samples in 0.0544789 seconds (0.00175738 seconds per sample), occupancy: 0.191306
I1011 14:58:44.758858 70 path_trace.cpp:387] Rendered 31 samples in 0.054704 seconds (0.00176464 seconds per sample), occupancy: 0.200582
I1011 14:58:44.760077 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.787374 70 path_trace.cpp:387] Rendered 16 samples in 0.00730705 seconds (0.000456691 seconds per sample), occupancy: 0.0249943
I1011 14:58:44.787712 80 path_trace.cpp:387] Rendered 16 samples in 0.00778008 seconds (0.000486255 seconds per sample), occupancy: 0.0234999
I1011 14:58:44.807122 69 path_trace.cpp:387] Rendered 16 samples in 0.0271831 seconds (0.00169894 seconds per sample), occupancy: 0.0951967
I1011 14:58:44.807525 71 path_trace.cpp:387] Rendered 16 samples in 0.02759 seconds (0.00172438 seconds per sample), occupancy: 0.0943572
I1011 14:58:44.808609 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.838322 80 path_trace.cpp:387] Rendered 16 samples in 0.0100031 seconds (0.000625193 seconds per sample), occupancy: 0.0383611
I1011 14:58:44.839077 70 path_trace.cpp:387] Rendered 16 samples in 0.0107491 seconds (0.000671819 seconds per sample), occupancy: 0.0349436
I1011 14:58:44.852921 71 path_trace.cpp:387] Rendered 16 samples in 0.024596 seconds (0.00153725 seconds per sample), occupancy: 0.0874385
I1011 14:58:44.853015 69 path_trace.cpp:387] Rendered 16 samples in 0.0246031 seconds (0.0015377 seconds per sample), occupancy: 0.102138
I1011 14:58:44.854089 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.887321 71 path_trace.cpp:387] Rendered 16 samples in 0.013072 seconds (0.000817001 seconds per sample), occupancy: 0.0375985
I1011 14:58:44.888067 80 path_trace.cpp:387] Rendered 16 samples in 0.0140178 seconds (0.000876114 seconds per sample), occupancy: 0.0532356
I1011 14:58:44.895463 70 path_trace.cpp:387] Rendered 16 samples in 0.0213101 seconds (0.00133188 seconds per sample), occupancy: 0.0898178
I1011 14:58:44.896032 69 path_trace.cpp:387] Rendered 16 samples in 0.021879 seconds (0.00136743 seconds per sample), occupancy: 0.0855809
I1011 14:58:44.897125 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.932638 80 path_trace.cpp:387] Rendered 16 samples in 0.0149961 seconds (0.000937253 seconds per sample), occupancy: 0.0635098
I1011 14:58:44.933192 69 path_trace.cpp:387] Rendered 16 samples in 0.0153632 seconds (0.000960201 seconds per sample), occupancy: 0.0456572
I1011 14:58:44.937207 70 path_trace.cpp:387] Rendered 16 samples in 0.0194621 seconds (0.00121638 seconds per sample), occupancy: 0.0760837
I1011 14:58:44.937533 71 path_trace.cpp:387] Rendered 16 samples in 0.0196879 seconds (0.00123049 seconds per sample), occupancy: 0.0796039
I1011 14:58:44.938577 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.976842 80 path_trace.cpp:387] Rendered 16 samples in 0.0161262 seconds (0.00100788 seconds per sample), occupancy: 0.0684921
I1011 14:58:44.977064 71 path_trace.cpp:387] Rendered 16 samples in 0.0162189 seconds (0.00101368 seconds per sample), occupancy: 0.0509055
I1011 14:58:44.979477 70 path_trace.cpp:387] Rendered 16 samples in 0.0186379 seconds (0.00116487 seconds per sample), occupancy: 0.0722774
I1011 14:58:44.979732 69 path_trace.cpp:387] Rendered 16 samples in 0.0188699 seconds (0.00117937 seconds per sample), occupancy: 0.0747307
I1011 14:58:44.980801 80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:45.019239 80 path_trace.cpp:387] Rendered 16 samples in 0.0169952 seconds (0.0010622 seconds per sample), occupancy: 0.052993
I1011 14:58:45.019341 69 path_trace.cpp:387] Rendered 16 samples in 0.0169919 seconds (0.00106199 seconds per sample), occupancy: 0.0536725
I1011 14:58:45.020012 71 path_trace.cpp:387] Rendered 16 samples in 0.01776 seconds (0.00111 seconds per sample), occupancy: 0.0696154
I1011 14:58:45.020270 70 path_trace.cpp:387] Rendered 16 samples in 0.0180159 seconds (0.00112599 seconds per sample), occupancy: 0.0719018
I1011 14:58:46.091145 80 session.cpp:152] Rendering in main loop is done in 1.41452 seconds.
I1011 14:58:46.091173 80 session.cpp:153]
Full path tracing report
Path tracing on: NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:ca:00]
NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:b1:00]
NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:31:00]
NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:17:00]
Render Scheduler Summary
Mode: Headless
Resolution: 1080x720
Adaptive sampling:
Use: True
Step: 16
Min Samples: 15
Threshold: 0.050000
Denoiser:
Use: True
Type: OptiX
Start Sample: 0
Passes: Color, Albedo, Normal
Rebalancer:
Number of requested rebalances: 7
Number of performed rebalances: 7
Time (in seconds):
Wall Average
Path Tracing 0.190430 0.001488
Adaptive Filter 0.007104 0.000056
Denoiser 1.031665 1.031665
Display Update 0.000002 0.000002
Rebalance 0.144004 0.020572
Total: 1.229201
Rendered 128 samples in 1.415434 seconds
I1011 14:58:46.091351 62 session.cpp:461] Total render time: 2.3604
I1011 14:58:46.091375 62 session.cpp:462] Render time (without synchronization): 1.41473
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
This looks like it’s a problem with Blender reconstructing the tiles and not the rendering itself, but I’m not sure.
Any suggestions on how this might get fixed, or ideas on where to look further would be helpful.