here in Cycles4D (using Cycles Standalone), I got a bottleneck when resetting the session.
the whole problem is: session->reset is quite slow (blocking) while it is just a call to resize/clear old memory buffers.
so I wanted to make this call in a thread till I modify scene data (direct modification for session->scene guarded by session->scene->mutex).
the problem:
I hit a deadlock!, I tried splitting the calls inside reset_cpu (just removing notify_all() and call it at the end of scene mutex unlock), but this didn’t work either.
any ideas how to do this? (reset the session buffers stuff and wait for a signal to resume rendering when it is safe?)
note: everything is for CPU (although I use GPU, but I force no OpenGL, and force CPU loop (run cpu, reset cpu, …)
The reason reset_cpu() blocks is because it calls device->task_cancel(), which waits for all rendering tasks to be cancelled. We can’t resize the buffers before that has happened, because the rendering threads are writing to them.
I think we can make this cancel non-blocking since the session thread will wait on rendering threads to be cancelled/finished before changing the buffer. This seems to work in Blender but I haven’t tested it fully yet: https://developer.blender.org/P631
thanks!
do I even need device->task_cancel() when running with GPU? as I’m forcing GPU to run as cpu (run_cpu, reset_cpu, no OpenGL).
I tried the same reset_cpu without the task_cancel and it works fine, but I’m not sure if it may crash or have any limitations, you may tell if you know. (though I don’t find any big performance difference).
also I got a question about delayed reset, why?
when I tried a direct reset_() call inside reset_cpu it gave me CUDA memory error assert, but with delayed reset it worked fine., no idea why.
If you’re progressively rendering one sample at a time you don’t strictly need device->task_cancel(), since you can wait until each sample is completed. But being able to cancel earlier can help keep things more responsive I think.
The delayed reset is needed so you don’t resize the buffer while the rendering threads are still working on it.
I posted the diff so you can test if it helps for you.
so at the end it will call task_cancel() internally using delayed reset? I’m not sure yet what does task_cancel() do, is it just early exit? or clears the device data too? etc…
I think it may be slower to call task_cancel()? 1 sample for GPU is the same as exiting early for 1 pixel I guess.
device->task_cancel() cancels any tasks that the device is doing, which in this case means it stops in the middle of rendering when not all pixels have completed.
GPU devices don’t always render all samples in one go, but it may often be the case in progressive viewport render. “Slower” is vague here, there’s a balance between faster response to user changes and faster to show a render at higher resolution or with more samples.
I see 2 things:
1- no deadlocks (to be honest I didn’t test this one yet), but it exits faster
2- it is more responsive that it slows down the software due to re-updating! (not exactly sure of what happens, but frame rate drops from 25 fps to 15 fps and a bit more laggy, but I see more updates in the render buffer).
which makes me think about how to make the viewport smoother by increasing the refresh interval from 0.1 to 0.2 second or something.
Ok. I don’t know what exactly you are doing in your code, and so really all I can do is guess what may be happening there. It’s not useful for me to spend more time on that, it’s better if you analyze what is happening in your code.
In Blender we’ve always done the reset right after the scene update, and this is still what I suggest to do. If there is a problem with that, it may be better to analyze why exactly there is a problem rather than doing something different to work around it.