Multi-GPU Render crashing

I have developed a remote rendering system to perform rendering on cloud infrastructure. The heart of the system is Blender, taking advantage of the command line rendering capability.
I have a multi-GPU system with 4 nVdidia A10 graphics cards, CUDA 11.7 fully setup and the latest drivers. I perform the rendering in a docker image where Blender 3.3.1 is installed at /usr/local/blender. The script that’s running inside calls the command line:

/usr/local/blender/blender -b blendFile.blend -o imgFilename# -f frame -- --cycles-device=CUDA 

If I run this docker passing only a single GPU into the image, it runs perfectly and outputs a rendered image. If I pass it multiple GPUs though, the rendering appears to run, but when Blender goes to save the output file I get an error:

pure virtual method called
terminate called without an active exception
Aborted (core dumped)

I’ve verified that other CUDA workloads work with multiple GPUs both on the base machine and inside the docker image. This is the only things that’s breaking.

I ran the command with --debug-cycles enabled and the full output is below:

I1011 14:58:34.150018    62 device.cpp:32] CUEW initialization succeeded
I1011 14:58:34.150056    62 device.cpp:34] Found precompiled kernels
I1011 14:58:43.730614    62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730638    62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:17:00".
I1011 14:58:43.730715    62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730723    62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:31:00".
I1011 14:58:43.730799    62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730806    62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:b1:00".
I1011 14:58:43.730883    62 device.cpp:182] Device has compute preemption or is not used for display.
I1011 14:58:43.730890    62 device.cpp:185] Added device "NVIDIA A10" with id "CUDA_NVIDIA A10_0000:ca:00".
I1011 14:58:43.730973    62 task.cpp:73] Overriding number of TBB threads to 4.
I1011 14:58:43.730994    62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:43.814178    62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:43.897723    62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:43.980720    62 device_impl.cpp:530] Mapped host memory limit set to 1,076,927,864,832 bytes. (1002.97G)
I1011 14:58:44.064992    62 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.098506    62 sync.cpp:288] Total time spent synchronizing data: 0.032656
I1011 14:58:44.253284    70 colorspace.cpp:140] Colorspace sRGB is sRGB
I1011 14:58:44.336920    62 colorspace.cpp:135] Colorspace Non-Color is no-op
I1011 14:58:44.476052    80 film.cpp:584] Effective scene passes:
I1011 14:58:44.476081    80 film.cpp:586] - type: combined, name: "Combined", mode: DENOISED, is_written: True
I1011 14:58:44.476094    80 film.cpp:586] - type: combined, name: "Noisy Image", mode: NOISY, is_written: True
I1011 14:58:44.476100    80 film.cpp:586] - type: adaptive_aux_buffer, name: "", mode: NOISY, is_written: True
I1011 14:58:44.476109    80 film.cpp:586] - type: denoising_normal, name: "Denoising Normal", mode: NOISY, is_written: True
I1011 14:58:44.476114    80 film.cpp:586] - type: denoising_albedo, name: "Denoising Albedo", mode: NOISY, is_written: True
I1011 14:58:44.476120    80 film.cpp:586] - type: depth, name: "Depth", mode: NOISY, is_written: True
I1011 14:58:44.476127    80 film.cpp:586] - type: sample_count, name: "", mode: NOISY, is_written: True
I1011 14:58:44.476135    80 film.cpp:586] - type: denoising_depth, name: "Denoising Depth", mode: NOISY, is_written: True
I1011 14:58:44.476217    80 device.cpp:39] OptiX initialization failed with error code 7804
I1011 14:58:44.476245    80 scene.cpp:591] Requested features:
I1011 14:58:44.476253    80 scene.cpp:592] Use BSDF True
I1011 14:58:44.476259    80 scene.cpp:593] Use Principled BSDF True
I1011 14:58:44.476266    80 scene.cpp:595] Use Emission True
I1011 14:58:44.476274    80 scene.cpp:597] Use Volume False
I1011 14:58:44.476279    80 scene.cpp:598] Use Bump True
I1011 14:58:44.476286    80 scene.cpp:599] Use Voronoi False
I1011 14:58:44.476294    80 scene.cpp:601] Use Shader Raytrace False
I1011 14:58:44.476300    80 scene.cpp:603] Use MNEEFalse
I1011 14:58:44.476306    80 scene.cpp:604] Use Transparent False
I1011 14:58:44.476313    80 scene.cpp:606] Use Denoising True
I1011 14:58:44.476320    80 scene.cpp:607] Use Path Tracing True
I1011 14:58:44.476326    80 scene.cpp:609] Use Hair False
I1011 14:58:44.476333    80 scene.cpp:610] Use Pointclouds False
I1011 14:58:44.476339    80 scene.cpp:612] Use Object Motion False
I1011 14:58:44.476346    80 scene.cpp:614] Use Camera Motion False
I1011 14:58:44.476353    80 scene.cpp:616] Use Baking False
I1011 14:58:44.476359    80 scene.cpp:617] Use Subsurface False
I1011 14:58:44.476366    80 scene.cpp:618] Use Volume False
I1011 14:58:44.476373    80 scene.cpp:619] Use Patch Evaluation False
I1011 14:58:44.476380    80 scene.cpp:621] Use Shadow Catcher False
I1011 14:58:44.476394    80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.476410    80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.512446    80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.512593    80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.512616    80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.546254    80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.546386    80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.546406    80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.577816    80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.577971    80 device_impl.cpp:249] Testing for pre-compiled kernel /usr/local/blender/3.3/scripts/addons/cycles/lib/kernel_sm_86.cubin.
I1011 14:58:44.577993    80 device_impl.cpp:251] Using precompiled kernel.
I1011 14:58:44.608696    80 device_impl.cpp:487] Local memory reserved 708,837,376 bytes. (676.00M)
I1011 14:58:44.608881    80 svm.cpp:73] Total 12 shaders.
I1011 14:58:44.643329    80 svm.cpp:149] Shader manager updated 12 shaders in 0.034452 seconds.
I1011 14:58:44.643402    80 object.cpp:691] Total 1 objects.
I1011 14:58:44.643559    80 particles.cpp:108] Total 0 particle systems.
I1011 14:58:44.643643    80 geometry.cpp:1805] Total 1 meshes.
I1011 14:58:44.645526    80 geometry.cpp:1294] Using BVH2 layout.
I1011 14:58:44.653132    80 tables.cpp:37] Total 1 lookup tables.
I1011 14:58:44.653322    80 light.cpp:982] Total 6 lights.
I1011 14:58:44.653332    80 light.cpp:219] Background MIS has been disabled.
I1011 14:58:44.653342    80 light.cpp:961] Number of lights sent to the device: 5
I1011 14:58:44.653349    80 light.cpp:963] Number of lights without contribution: 1
I1011 14:58:44.653429    80 light.cpp:314] Total 5 of light distribution primitives.
I1011 14:58:44.653873    80 tables.cpp:37] Total 2 lookup tables.
I1011 14:58:44.654079    80 scene.cpp:378] System memory statistics after full device sync:
  Usage: 346,245,288 (330.21M)
  Peak: 360,568,064 (343.86M)
I1011 14:58:44.654101    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.678746    80 path_trace.cpp:387] Rendered 1 samples in 0.002285 seconds (0.002285 seconds per sample), occupancy: 0.00560104
I1011 14:58:44.678809    70 path_trace.cpp:387] Rendered 1 samples in 0.00233698 seconds (0.00233698 seconds per sample), occupancy: 0.00563153
I1011 14:58:44.680454    69 path_trace.cpp:387] Rendered 1 samples in 0.00399113 seconds (0.00399113 seconds per sample), occupancy: 0.0138888
I1011 14:58:44.680626    71 path_trace.cpp:387] Rendered 1 samples in 0.00416303 seconds (0.00416303 seconds per sample), occupancy: 0.0134319
I1011 14:58:44.680694    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.719704    80 path_trace.cpp:387] Rendered 31 samples in 0.015553 seconds (0.00050171 seconds per sample), occupancy: 0.137327
I1011 14:58:44.720160    71 path_trace.cpp:387] Rendered 31 samples in 0.0160031 seconds (0.00051623 seconds per sample), occupancy: 0.124251
I1011 14:58:44.758633    69 path_trace.cpp:387] Rendered 31 samples in 0.0544789 seconds (0.00175738 seconds per sample), occupancy: 0.191306
I1011 14:58:44.758858    70 path_trace.cpp:387] Rendered 31 samples in 0.054704 seconds (0.00176464 seconds per sample), occupancy: 0.200582
I1011 14:58:44.760077    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.787374    70 path_trace.cpp:387] Rendered 16 samples in 0.00730705 seconds (0.000456691 seconds per sample), occupancy: 0.0249943
I1011 14:58:44.787712    80 path_trace.cpp:387] Rendered 16 samples in 0.00778008 seconds (0.000486255 seconds per sample), occupancy: 0.0234999
I1011 14:58:44.807122    69 path_trace.cpp:387] Rendered 16 samples in 0.0271831 seconds (0.00169894 seconds per sample), occupancy: 0.0951967
I1011 14:58:44.807525    71 path_trace.cpp:387] Rendered 16 samples in 0.02759 seconds (0.00172438 seconds per sample), occupancy: 0.0943572
I1011 14:58:44.808609    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.838322    80 path_trace.cpp:387] Rendered 16 samples in 0.0100031 seconds (0.000625193 seconds per sample), occupancy: 0.0383611
I1011 14:58:44.839077    70 path_trace.cpp:387] Rendered 16 samples in 0.0107491 seconds (0.000671819 seconds per sample), occupancy: 0.0349436
I1011 14:58:44.852921    71 path_trace.cpp:387] Rendered 16 samples in 0.024596 seconds (0.00153725 seconds per sample), occupancy: 0.0874385
I1011 14:58:44.853015    69 path_trace.cpp:387] Rendered 16 samples in 0.0246031 seconds (0.0015377 seconds per sample), occupancy: 0.102138
I1011 14:58:44.854089    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.887321    71 path_trace.cpp:387] Rendered 16 samples in 0.013072 seconds (0.000817001 seconds per sample), occupancy: 0.0375985
I1011 14:58:44.888067    80 path_trace.cpp:387] Rendered 16 samples in 0.0140178 seconds (0.000876114 seconds per sample), occupancy: 0.0532356
I1011 14:58:44.895463    70 path_trace.cpp:387] Rendered 16 samples in 0.0213101 seconds (0.00133188 seconds per sample), occupancy: 0.0898178
I1011 14:58:44.896032    69 path_trace.cpp:387] Rendered 16 samples in 0.021879 seconds (0.00136743 seconds per sample), occupancy: 0.0855809
I1011 14:58:44.897125    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.932638    80 path_trace.cpp:387] Rendered 16 samples in 0.0149961 seconds (0.000937253 seconds per sample), occupancy: 0.0635098
I1011 14:58:44.933192    69 path_trace.cpp:387] Rendered 16 samples in 0.0153632 seconds (0.000960201 seconds per sample), occupancy: 0.0456572
I1011 14:58:44.937207    70 path_trace.cpp:387] Rendered 16 samples in 0.0194621 seconds (0.00121638 seconds per sample), occupancy: 0.0760837
I1011 14:58:44.937533    71 path_trace.cpp:387] Rendered 16 samples in 0.0196879 seconds (0.00123049 seconds per sample), occupancy: 0.0796039
I1011 14:58:44.938577    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:44.976842    80 path_trace.cpp:387] Rendered 16 samples in 0.0161262 seconds (0.00100788 seconds per sample), occupancy: 0.0684921
I1011 14:58:44.977064    71 path_trace.cpp:387] Rendered 16 samples in 0.0162189 seconds (0.00101368 seconds per sample), occupancy: 0.0509055
I1011 14:58:44.979477    70 path_trace.cpp:387] Rendered 16 samples in 0.0186379 seconds (0.00116487 seconds per sample), occupancy: 0.0722774
I1011 14:58:44.979732    69 path_trace.cpp:387] Rendered 16 samples in 0.0188699 seconds (0.00117937 seconds per sample), occupancy: 0.0747307
I1011 14:58:44.980801    80 device_impl.cpp:58] Using AVX2 CPU kernels.
I1011 14:58:45.019239    80 path_trace.cpp:387] Rendered 16 samples in 0.0169952 seconds (0.0010622 seconds per sample), occupancy: 0.052993
I1011 14:58:45.019341    69 path_trace.cpp:387] Rendered 16 samples in 0.0169919 seconds (0.00106199 seconds per sample), occupancy: 0.0536725
I1011 14:58:45.020012    71 path_trace.cpp:387] Rendered 16 samples in 0.01776 seconds (0.00111 seconds per sample), occupancy: 0.0696154
I1011 14:58:45.020270    70 path_trace.cpp:387] Rendered 16 samples in 0.0180159 seconds (0.00112599 seconds per sample), occupancy: 0.0719018
I1011 14:58:46.091145    80 session.cpp:152] Rendering in main loop is done in 1.41452 seconds.
I1011 14:58:46.091173    80 session.cpp:153] 
Full path tracing report

Path tracing on: NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:ca:00]
                 NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:b1:00]
                 NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:31:00]
                 NVIDIA A10 (CUDA) [CUDA_NVIDIA A10_0000:17:00]

Render Scheduler Summary

Mode: Headless
Resolution: 1080x720

Adaptive sampling:
  Use: True
  Step: 16
  Min Samples: 15
  Threshold: 0.050000

Denoiser:
  Use: True
  Type: OptiX
  Start Sample: 0
  Passes: Color, Albedo, Normal

Rebalancer:
  Number of requested rebalances: 7
  Number of performed rebalances: 7

Time (in seconds):
                                       Wall              Average
          Path Tracing             0.190430             0.001488
       Adaptive Filter             0.007104             0.000056
              Denoiser             1.031665             1.031665
        Display Update             0.000002             0.000002
             Rebalance             0.144004             0.020572

  Total: 1.229201

Rendered 128 samples in 1.415434 seconds
I1011 14:58:46.091351    62 session.cpp:461] Total render time: 2.3604
I1011 14:58:46.091375    62 session.cpp:462] Render time (without synchronization): 1.41473
pure virtual method called
terminate called without an active exception
Aborted (core dumped)

This looks like it’s a problem with Blender reconstructing the tiles and not the rendering itself, but I’m not sure.

Any suggestions on how this might get fixed, or ideas on where to look further would be helpful.

Please file a bug on developer.blender.org