Scheduling blender renders across multiple gpus when render times per frame are low


I’m running thousands of renders for synthetic data generation across 4 gpus, which are rtx a6000s. The way it works is that I spawn 4 blender processes which each get one fourth of the frames in a queue. Each blender process is using only one of the 4 gpus in the system. For each frame in the queue, I use blender api to set up the scene with models and materials and whatnot, after which it’s rendered and saved. After that, all the scene components are deleted and I start from (presumably) an empty scene with the next frame. By frame here I don’t mean animation, each frame in the synthetic data is sort of random orderings of a random number of 3D models, so the image varies a lot per frame (both models, materials, and scene parameters like lights).

The main bottleneck seems to be either cpu or I/O. With 4 gpus you’d expect a 4x uplift in total speed ideally, but I’m getting closer to 1.5-2x. The task manager shows that all cpus cores are running at high usage most of the time. I’m definitely not using enough bandwidth to saturate the write speed of the ssd.

One thing I noticed is that say I spawn 4 blender processes across 4 gpus each with 1000 frames in the queue. Normally process 1 finishes first, process 2 finishes second, and so on. There is a significant gap between each of these queues being completed. Ideally I’d expect every process to finish simultaneously, since the scenes don’t vary enough that with a random uniform distribution of frames you’d be doing about the same amount of rendering work on each gpu.

One thing I’ve tried is to create a blender process pool, sort of. I was thinking that when each process is rendering 1000 frames, maybe the processes are going stale and losing priority in the OS? So instead I divide the data into chunks of say 100 frames, spawn processes that render those 100 frames then I kill the processes. Repeat until all the chunks are completed. This has resulted in about a 9% improvement in rendering speed when rendering 2000 images, haven’t figured out if this scales beyond 2000 images yet.

Any other things I could try? I have another thread similar to this where I got some solutions, but those solutions seem like they won’t be effective for my use case.