Cycles X - resumable chunk rendering

Hi @brecht I’ve been planning on experimenting with resumable chunk rendering and looking at how our distributed/network rendering addon could use it. However, I’ve noticed in the cycles-x branch and master, it looks like its been removed (referring to blender/intern/cycles/blender/blender_python.cpp )?

Are there plans to implement anything similar? I think sample splitting would be an excellent method for some use cases of distributed rendering, and I’d love to offer that ability to our users, but if its gone in cycles-x, I’m concerned it would all be for nothing if I did implement using resumable sample rendering.

Thanks!

gpu once rendered the less the better, close the cpu (cpu and gpu running at the same time will grab threads) rendering is equal to the fastest resolution, cpu is partitioned the more the better, cyclesx solve the problem of cpu tiling rendering (a whole sheet and 32x32 partition almost the same) and then give partitioned rendering is not very meaningful, but the rendering becomes slower

We plan to add back support for resumable rendering. But strictly as a way to stop and resume renders, not as a way to renderer different samples on different computers in parallel.

Thanks Brecht, good to know that. Curious to know, would such a scheme for distributing samples across multiple nodes be both possible, and if done in the right way, acceptable as a patch for master?

It’s not going to work well with adaptive sampling, but I can see some use cases. I wouldn’t mind a patch for a Cycles scene setting for sample offset.

Hi Brecht, cool :slight_smile: we’ve talked amongst us crowdrender devs, and we’re keen to make this happen. Just want to check our understanding at a high level though.

A patch to enable sample offsets would allow for giving separate machines a different starting sample to start rendering from. Each machine would render until it converged or hit its max sample limit for each pixel.

This would give us, I assume an openexr multilayer image on each machine that contains samples from different offsets.

What we’d then need to do is to combine them somehow. Cycles prior to cycles-x had an operator for this, but this seems to also have been removed. We’re happy to either build our own, or maybe better yet include an appropriate technique for this in the patch that would allow the pixel data from each machine to be combined.

Does the above sound roughly correct and acceptable? Am I missing or oversimplifying anything?

James

1 Like

Yes that would work perfectly. Giving each machine a starting offset sample number. To combine them you can just combine the images by summing the results and dividing by the number of workers. To work with adaptive sampling you will need to consider how the adaptive sampling selects the sample sets. It that case giving each machine a set of continuous sample blocks would work better allowing the adaptive sampling to work. The block sizes would have to be pretty big > 4096 samples per block but you could add code to move to another block of samples if you run out both the PMJ and Sobol samplers have a huge (2^32) sample size so this could work.

2 Likes

Great, thanks for jumping in man! Ok, we’re going to prototype this and then submit a patch and I guess go through the process you guys have for vetting it :blush:

Just so you know, this will be the first time I’ve done this, so any advice is appreciated!

No problem, you can include me as one of the reviewers :slight_smile: and feel free to ask any questions you might have.

1 Like

Awesome sauce! Looking forward to it, and yeah, I expect I’ll have questions a’plenty :smiley:

@leesonw and @brecht Ok, just an update here. The patch is in testing. It’s got the sample offset setting implemented and we’re testing it now to see how effective sample splitting can be in cycles-x.

Needless to say we’re pretty excited about this. @leesonw what exactly do I need to do next once I think we’re done testing and we want to submit the patch? That just done via the submit link on developer.blender.org?

1 Like

Yeah just create the patch on developer.blender.org and add us as reviewers.

@leesonw Will, do.

On another note, I’m currently testing merging the rendered results of two sample ranges. I basically used the offset setting to render a contiguous sample sequence in two different renders by using the new offset setting. This produces the render results as open exr files as I had configured Blender to do that via the render properties panel in Blender’s UI as would any user do.

However, the operator bpy.ops.cycles.merge_images no longer seems to work.

I got this error

>>> bpy.ops.cycles.merge_images(input_filepath1=fp1, input_filepath2=fp2, output_filepath=fpo)
Error: No sample number specified in the file for layer Composite or on the command line

Traceback (most recent call last):
  File "<blender_console>", line 1, in <module>
  File "C:\Users\CrowdRender\git\build_windows_x64_vc16_Release\bin\Release\3.0\scripts\modules\bpy\ops.py", line 132, in __call__
    ret = _op_call(self.idname_py(), None, kw)
RuntimeError: Error: No sample number specified in the file for layer Composite or on the command line

I traced the error back to here in then branch we’re working on (props to Pembem22 by the way who is immensely helpful and has been doing the programming for the patch). From inspecting this line of code, I think that there is an OCIO attribute not being set.

The attribute in question is queried in the above code like this;

in_spec.get_string_attribute("cycles." + name + ".samples", "");

So far, I’ve not been able to find a call to the OCIO’s attribute method to set this anywhere.

Might this have been removed? If so, we’ll put it back :slight_smile:

Maybe or renamed, I am not too familiar with this part of the code but I do remember something about changing the naming to use underscores or something like that. You could try replacing the dots with underscores to see if that works. However, if you find it does not exist then add it :slight_smile:

1 Like

Thanks @leesonw :slight_smile:

Might be me, just chatting with the man doing the dev work. Seems that I might have got my self confused. Wouldn’t be the first time either.

So, the exr file apparently has the right attribute in there, just saw a dump of the render result’s meta data. But the issue is there is an extra view layer in there, the one called “composite”. This is the layer that triggered the error because apparently it has no sample count given to it.

I think maybe this was my fault, I just did an F12 render with two different sample offsets, and then tried to use the resulting image files ( I set Blender to output to Multilayer EXR). Perhaps that was not how Blender used to output its image files for resumable chunk rendering? If so, would be awesome to know, as maybe we don’t have to put back anything. We can just make sure Blender puts the sample count into the “composite” viewlayer?

This is trying to merge a render with a compositing layer, which can’t work. We should probably display a warning messages and continue merging other layers.

However for correct results, you need to merge the raw view layers, and then run compositing on the merged EXR.

1 Like

Hi Brecht, thanks for your reply :slight_smile: ok, I’d be happy to include what you suggested about warning and continuing in our patch. Seems like a good solution.

Also had a question, when it comes to merging multiple results, the current operator only allows two input files, but the _cycles.merge function and C++ code suggests that this can actually handle a list of file paths. So would the merge function be able to take a list of more than two Image results and merge them all at once?

Question #2 - denoising - I am not sure how denoising is implemented internally, but watching it happen, it seems the Optix and NLM both apply the denoising to the image as it is rendered. Does this mean it can/can’t be used in conjunction with sample offset/merging? Could it with a modification?

Question #3 - Designing the patch properly. I have found the code where the error is being thrown right now. It seems fairly easy to change it to not throw on encountering the “composite” layer. But this would rely on the name not changing to something else. Is this the best way to filter out layers not intended for merging? Just want to make sure we’re coding for the right criteria. Also an example of how to issue a warning would be good :slight_smile: the merge.cpp file doesn’t seem to do warnings, just errors, so curious to know how that is done.

#Question 4 - merging images with a variable # of samples per pixel - I’ve come across the merge_pixels method in merge.cpp. Seems like this is the code that takes the data from the input image files and averages the pixel values. It seems to assume that each pixel has the same number of samples though. Would this need modifying to work properly with pixel data which has been rendered using adaptive sampling? Happy to look at doing that if so.

Thank you!

Hi @brecht Ok, follow up to Question #4 here.

It looks like when using the sample count debug pass, the necessary weights for each pixel are actually contained in this pass.

So it appears that I could multiply the sample count pass with the pixel colour data of the raw passes, and then sum up this result for each render result I have rendered using the sample offset setting.

Am I barking mad suggesting this or is it viable?

If it is viable, would like to include it in our patch.

James

The cycles_ module function can take multiple inputs actually, it’s just the operator that has just two inputs due to limitations in the operator API.

NLM is gone, for OptiX there is ⚙ D11442 Cycles X: Add OptiX temporal denoising support that is adding support for OptiX denoising after rendering.

Hardcoding the Composite name is not that great, but to me it seems reasonable in this case. I’d rather do that than skip arbitrary layers without samples info, since then it’s more likely there is actually something wrong.

To display a message you could change the merge function to return a tuple with both an error and info string. available_devices_func is an example of a function returning a tuple.

Making it work with different per-pixel number of samples would be a good improvement.

3 Likes