2024-09-05 Flamenco Dependency Packing / BAT

Agenda

  • What exactly is the problem?
  • What is the preferable solution?
  • How long would this take?
  • Who can work on this?
  • When can this be prioritized?

Attending

  • Bastien Montagne
  • Sebastian Parborg
  • Sergey Sharybin
  • Simon Thommes
  • Sybren Stüvel

Problem

  • Flamenco uses BAT for packing all dependencies of a job, which analyzes the .blend file separately from Blender and does not properly take nested dependencies into account.
  • So far this has not been a problem, since BAT did still catch all dependencies due to how Blender references (/referenced) nested dependencies directly
  • With recent refactors this is becoming an issue with dependencies that are missing from the pack
  • More problems will continue to come up with how BAT is analyzing dependencies without loading the .blend files

Proposed Solution Options

  • Use Blender internal functionality for packing in the Flamenco Addon instead of BAT ( → long term)
  • Make BAT use Blender’s internal dependency information for packing, rather than relying on independent analysis ( → short term)
  • Implement recursive dependency analysis in BAT

Meeting Outcome

There is a preferable long term solution to stop relying on BAT, since currently this continuously put the burden on keeping up with Blender development to find all necessary dependencies.
Since sufficiently implementing this solution would require a large effort of core development regarding multiple different big topics, this is currently not a feasible target to be prioritized.
There is a sufficient short term solution to keep using BAT but replace collecting the dependecies by exposing existing Blender functionality to the python API and using the active Blender instance when submitting a job with Flamenco.

Short Term Solution
(within few months)

  • we keep using BAT for remapping and editing the paths without loading the files
    • Bastien exposes existing filepath iterator system to python API (few days of work - either still before holiday or right after)
    • Sybren implements Flamenco (python) functionality to pass information about dependencies and filepath remapping from Blender’s already loaded instance when submitting a job to BAT

Long Term Solution
(big breaking changes for upcoming series or beyond)

  • Avoid the necessity of using BAT as a separate system that needs to be kept up-to-date with any Blender changes
  • Add required functionality to Blender itself
  • Avoid building viewport depsgraph on file-load with command line execution (for performance when recursively going through libraries)
  • Consistent pointers for hashable blend files (for shaman)
7 Likes

What’s meant by this? I think I get the big picture but not how we’d actually be able to avoid reallocating data on file read for example.

One thing I liked about Clarise was that you could ask it to list all the dependencies from a given project.
something like blender --list-dependencies myproject.blend of course this could also be a python script.

  • no need to load the full blend file. can reuse logic blender already contains.
  • output could also mention the id blocks it uses from the dependency.
  • bat could be adapted to use this information to be more aware.
3 Likes

I’m not super firm on the technical details here, but from what I understood this is about moving away from a full on memory dump for the .blend file format and more to a serialized format.
This is obv. a major compatibility breakage and not to be done any time soon probably. But it would be part of the ‘full and proper’ solution to this issue.

1 Like

This is a bit more nuanced I think.

Ultimately, we likely want to move away from the full memory dump that we are currently doing. But this is also tied to topics like the future of DNA & RNA, and is not likely to be addressed any time soon (as in, most likely not in the upcoming year at the very least). This requires a lot of design work as it will affect a huge part of our code-base, including several very low-level ones.

The more immediate improvement that might me more doable short(-ish) term is replacing our current usage of raw memory addresses as ‘uid’ for blocks of data in blendfiles, by more stable generated numbers. E.g. simply by incrementing a counter for each ID block of data, and another counter for each sub-ID block of data within each ID, the ‘binary diffing noise’ between consecutive blendfile saves when only minor (or even no changes) are done should be drastically reduced.

So one should then be able to edit a property of an ID (like a filepath) while keeping the remaining parts of the blendfile actually unchanged. And if the exact same edit is done again, one should be able to get the exact same blendfile, binary-wise. This is what matters for hashes to remain unchanged, and Shaman file management system to be able to do its work accurately, without having to rely on manual editing of the blendfile itself like what BAT does currently.

With that idea, for Blender blendfile I/O, reading would remain the same, but writing would require mappings between actual pointer value and a ‘stable’ index, and some code to ensure that all written pointers are remapped through these mappings (with added benefit that unwritten data would be ‘automatically’ remapped to nullptr, instead of keeping dangling invalid pointer values).

This will likely require a double-pass write process, and some further code path splitting between regular file write and undo steps write, among other things to think about!

Created a design task for this BTW: #127706 - Blendfile: Improve Stability of Pointer Values - blender - Blender Projects

1 Like