GSoC 2024: Improve Distributed Rendering & Task Execution

Hi everyone!

My name is David Zhang, and I’ll be contributing to Flamenco over the summer, with improvements including allowing jobs to be paused and introducing sample-based distributed rendering of single images, which brings about better resource management and more flexible job scheduling.

For more details of implementation, see my original proposal

Synopsis

The objective of this project is to enhance the distributed rendering and task execution capabilities within Blender through several key improvements.

Firstly, we introduce the ability to pause jobs and submit them in a paused state, providing users with increased control over their rendering workflow and resource allocation. This feature will be particularly advantageous during peak usage periods or when prioritizing specific tasks.

Furthermore, we address the challenge of distributed rendering of single images by adopting a sample-based rendering approach. This method ensures more efficient utilization of computational resources across nodes, minimizing memory usage and avoiding artifacts caused by boundary dependencies.

Benefits

The benefits of these improvements to Blender and its community of artists are manifold. Artists will experience enhanced rendering efficiency and flexibility, enabling them to focus more on creativity and less on managing technical constraints. The introduction of job pausing and the ability to submit jobs in a paused state will allow for better resource management, reducing wait times and optimizing the use of available computational resources. The distributed rendering improvements will directly benefit artists working on complex scenes by reducing rendering times and improving image quality, without the need for extensive technical adjustments. These developments will also support future Blender enhancements by providing a more robust and flexible framework for distributed task execution and rendering.

Deliverables

The final deliverables of this project include

  1. New buttons and options for pausing and submitting jobs in a paused state in both the Manager web interface and the Flamenco Blender add-on. Pausing tasks are supported in the meantime.

  2. Improvements in distributed rendering that allows for distributed rendering of single images with minimized memory usage encapsulated in custom JOB_TYPE definitions and a Python merge script for efficient image processing.

14 Likes

Week 1
May 27 - May 31

During the first week, I

  • Had a weekly meeting with my mentor, agreed on details (such as how job transition logic should be modified after the introduction of a new state, why an intermediate state would be important, etc.), and learned about code contribution etiquettes (the idea behind making smaller commits and how commits involving OpenAPI should be structured)
  • Created a pull request for my first deliverable, which is support pausing jobs
  • Introduced paused state and implemented relevant status transition logic
  • Added basic test cases for unit testing

I was having a dental surgery during the week, so I wasn’t in the best mood of working.

In the following week, I will:

  • Collect more feedback from the community and polish the status transition logic implementation
  • Complete the frontend part to allow users to actually pause a job from the interface
  • Add more test cases and rigorously test everything implemented so far
4 Likes

Week 2
June 3 - June 7

During the second week, I made some significant progress on the project, including:

  • A working #1 Deliverable. When the user clicks on the Pause Job button, Flamenco sends the job to pause-requested status, and depending on the specific situation, either waits for active jobs to complete, fails the entire job, or sets the job status to paused. See the following demo:


  • Had a weekly meeting and specifically talked about interactive rebase, which I could use to flexibly build on top of existing codebase while being able to keep updating the existing codebase

  • Add test cases and make sure original test cases work with the introduction of a new job status

  • More minor edge cases considerations and created a design issue

In the following week, I have some personal stuff planned, but I will:

  • Build on top of what I’ve accomplished and brainstorm ways to implement submitting jobs with paused status (maybe involve upgrading the job compiler? Addon? Custom job type? API changes?)
  • Review the first PR and write more test cases
5 Likes

Week 3
June 10 - June 14

I spent a great amount of time traveling and moving to another city this week, so I wasn’t as productive as the previous two weeks. I mainly:

  • Added a few more minor fixes to my Deliverable #1, and fixed a few edge cases
  • Played around with the job_compiler and the addon component of Flamenco. They will be useful for implementing my Deliverable #2
  • Submitted my first PR to my mentor for review

I was on a plane during our regular weekly meeting time with my mentor, so we didn’t meet this week.

For the next week, I plan to spend a lot of time on the project, and here are some of my (ambitious) goals:

  • Address any feedback from the PR review
  • Add a few more test cases as I play around and as they come to my mind
  • Have a minimally useable product that is able to submit paused jobs
4 Likes

Week 4
June 17 - June 21

This was quite a productive week - I completed most of the tasks planned for this week, namely:

  • Fixed a few more minor issues in Deliverable #1 and code quality issues and submitted again for re-review.
  • Made original test cases work with the code changes and added a few more.
  • Backend for submitting jobs with an initial status, which involves OpenAPI and job_compiler changes. Job submission was not previously allowed to be assigned an initial status. Flamenco would put everything into a boring queued status.
  • Added a little checkbox in the add-on frontend that says “Submit As Paused”. Sadly it’s not quite functioning for now, but it soon will be.

  • Weekly meeting that went over implementation details of my second deliverable

Most of the work for the first two deliverables should be completed in just a few more days. I might be out of town next Thursday as it is a national holiday. Perhaps it’s time to look into my next deliverable, distributed rendering! (Hooray! That’s the fun part)

5 Likes