Hello,
I am running thousands of renders for synthetic data generation on 4 gpus. The time to render each image is about 1.2s if I use all 4 gpus, and only about 1 second if I only use 1 gpu (due to read/write overhead across 4 gpus I guess). To speed up the render time, I had an idea to open 4 blender processes and assign each with a separate gpu, and then dividing the workload by 4 and feeding each gpu its own workload. Each gpu gets a unique list of images to render.
To do this I spawn 4 blender processes using subprocess.Popen, then using subprocess.communicate() to wait for each process to finish rendering. Each blender process gets a separate gpu to work with. I do that by setting the use flag of every device to 0, and then setting the gpu I want to be used as 1.
Theoretically this should result in a 4x speedup since it’s running on separate gpus altogether. But in practice, I’m getting something like 2.7s to render each image. Overall it’s still faster to render with, but the speedup is only about 70% rather than 4x. Any ideas on how to get the full 4x increase?