Asset Embedding [Proposal]

jacqueslucke · May 5, 2024, 11:48am

The fundamental problem is that no existing import method works well for asset libraries. They might work well for a few use-cases, but definitely not for all kinds of asset libraries we care about. Most blocking for us, this includes the essentials asset library that we integrate into Blender.

Existing Import Methods

Linking generally works well for the use-cases that it has been designed for, but there are problems when using it with third-party asset libraries or assets shipped with Blender.

It makes .blend files dependent on the asset libraries to be available and requires them to be saved at specific file paths. This is fine for a bigger production where these things can be tightly controlled, but is bad for many users that want to create standalone files using assets. It shouldn’t be necessary for them to explicitly pack assets. They might not even be aware of the fact that they are using linked assets if they are deeply integrated with Blender or Add-ons.
We don’t want people to link to assets that ship with Blender, because that makes it impossible for us to replace them in future versions without breaking people’s files.

Appending works well for adding a data-block that is supposed to be modified locally. However, for assets this is problematic because appending creates a copy of the data. Those copies are not locally modified in many cases though. The issue with creating these copies becomes apparent when appending the same asset more than once or when linking from different files which contain the same appended assets. In those cases we’ll end up with multiple identical data-blocks. This is annoying and confusing because it’s expected that the asset data-block only exists once.

Appending with reusing data was added to improve the standard append mechanism a little bit for simple use-cases. It generally avoids appending an asset a second time if it has been appended before. This usually works, but there are still problems:

The previously mentioned case of duplicate data-blocks after linking from multiple files which have the same appended assets still exists.
There is no mechanism that checks if the appended asset has been modified locally. This makes it difficult to know if a newly imported asset will be the original asset or the locally modified one.

Previous Work

While it’s generally agreed upon that these methods don’t work well for assets in all the cases that are important to us, there is no consensus on the solution yet.

The result after the initial discussion can be found in this task.

A previous proposal was discussed, but the additional overhead required even for simple local assets was too much (among other issues).

Another discussed potential solution is to not change the existing functionality much, but to just hide it more from the users. For example, Blender could try to detect duplicates and just not show them even though they are still there. This might make things less annoying for users but also has downsides. Users become more detached from the data-blocks in their scenes which could be a real surprise when they find out that there are actually many more data-blocks under the hood. Furthermore, there can also be a lot of unnecessary overhead just by detecting and processing duplicate data. Hiding some data-blocks from the user can make sense in some places, but is not a solution to the fundamental problem in my opinion.

Proposal

At a high level the idea is to have an automatically generated hash for each data-block which changes whenever the data-block or any of its dependencies changes. When importing an asset, it is embedded together with its dependencies. Contrary to what appending would do, the imported data-blocks remain in an uneditable state like linked data. In fact, internally they are still linked, but using a new kind of virtual library that uses the data-block hash instead of a file path as identifier.

The following sections explain the proposal in more detail.

Data-Block Hash

The main requirement for these hashes is that two data-blocks with the same hash can be used interchangeably by Blender. It’s not guaranteed that all data-blocks that could be used interchangeably will have the same hash though.

I’m using the term “interchangeable” instead of “identical” intentionally. Some data-blocks can be interchangeable even if they are not identical. For example, two node trees that are identical but where the nodes have different positions are interchangeable but not identical.

One has to differentiate between two kinds of data-block hashes:

Shallow Hash: Hash of the data-block itself only, ignoring any other referenced data.
Deep Hash: Hash of the data-block itself and all other referenced data.

In the end, we’ll always need the deep hash to actually find interchangeable data. Unfortunately, it’s not generally possible to save the deep hash in the asset file. That’s because the asset may link other data-blocks from other files. Changing these linked files has to automatically change the deep hash of the dependent assets, otherwise the main requirement can not be met reliably.

The shallow hash can be computed and stored in the asset files though. It seems like a good approach to always write the shallow hashes in the asset files, but to only compute the deep hash whenever importing an asset. It can be based on the individual shallow hashes.

While we could actually implement data-block hashing, the main requirement is also met if we just generate a random shallow hash whenever a data-block is changed. So I’d start with that for now.

Embedding and Virtual Library

Imported assets will be embedded while also having ID.lib set to a new kind of Library: a virtual library. It is virtual in the sense that it does not reference any file-path. Instead, its identifier is just the deep hash of the asset. Since the deep hash is different for an asset and its dependencies, a separate virtual library is created for each imported data-block.

Embedded data-blocks have features of appended and linked data-blocks:

They are stored within the .blend file that uses them. So the original .blend file can be removed without breaking files.
They are still part of a library reference which has a few consequences:
- They can’t be modified in a way that makes them non-interchangeable with the original data.
- Their name can stay as is and does not have to be made unique across all data-blocks.

If the user wants to edit the asset locally, it has to be made local. By doing that, it’s not automatically deduplicated anymore. Additionally, we could also support creating local overrides for embedded assets.

Discussion

This section contains some extended information that may be important to consider in the context of this proposal.

Versions and Variations

The core data-block management code does not have to know about variations of a data-block with this proposal, because the core problem that leads to duplicate data-blocks is solved at a lower level. Nevertheless, I briefly want to touch on that topic because it’s quite related anyway.

Changing to a different variation (or version) of a data-block comes down to first finding the different available variations and then just using the generic replace data-block operator. The unsolved problem here is how to find the set of available variations. Currently, I imagine that a combination two approaches would solve the majority of use-cases:

A virtual library could still store the original file path and data-block name that it comes from. Then Blender can detect whether there is a new data-block variation at this place that the user might want to update to. This works out of the box without any extra work by the user.
Asset authors can optionally assign globally unique data-block identifiers to their assets. Then Blender could scan all available assets to find the ones with the same identifier. Then the user can choose which one to change to. The variations could have a label and a version number. Those don’t affect low-level data-block management code, but just help the user or higher level scripts to decide which variation to switch to.

Appending Data-Blocks that use Assets

Generally, embedded assets should stay embedded data-blocks unless they are explicitly made local. So, by default Blender would just copy the used assets to the current .blend file, but keep their virtual library intact.

Linking Assets (without embedding)

Ideally, we’d only allow linking to asset libraries if they are part of the project that the user works on. This is important to keep projects self-contained. Unfortunately, while planned, Blender does not know what a “project” is yet. Until then, we probably need the ability to let the user choose between linking and embedding for each asset library.

Embedding should certainly be the default and is enforced for the essentials library.

Assets Referencing Other Assets

Within a growing asset ecosystem, it will become more and more common that assets use other assets from the same or another asset library (including the essentials asset library). This will probably be most common with node group assets.

Each asset library should be self-contained, so it should not link to assets outside of that asset library. It can embed assets from other libraries though. Each individual asset file does not have to be self-contained though, so it can link to assets within the same asset library.

For the asset author, an asset library is a project. The rules for projects in general seem to apply here too.

Jonathan_Mafi · May 10, 2024, 12:57am

Hashing with respect also to dependencies sounds like a great idea am glad this proposal is being thought about and put forward :).

What actually in the datablock is getting deep hashed? Is the deep hashing of datablock meaning one could hash a large embedded media piece (say a large video) and wouldnt this make for a rather slow process just to get the hash, especially if one has a fairly large scene with 100s/1000s of integrated assets? Im guessing the hashing is happening on some specific properties of the datablock?

Apologies if my understanding of datablocks is limited, i utilise them but at high level mainly for pipeline dev purposes and dont get into the technical weeds as much, hence also my interest in this proposal

jacqueslucke · May 10, 2024, 11:55am

What actually in the datablock is getting deep hashed?

That’s still up to debate. It’s not even clear yet whether we’ll do actual hashing or just generate a random “hash” when something changed. In theory, hashing could include large embedded media data. Not sure if performance would really be an issue here. The hashing of the media data could be done when saving the asset file, and not when just using it in another file.

MidnightArrow · May 12, 2024, 5:03am

What happened to the “every object will become an Empty with an implicit geonode modifier” proposal? And the checkpoint node which is supposed to allow users to freeze geometry so they can edit procedural geometry?

As somebody who needs to do a lot of shot-sculpting on linked objects, I’m not really interested in “you need to make it local”.

I’d rather see something like the Geonodes Shape Key addon, but expanded into a core feature. BMesh objects become a geonode “modifier”. When they’re linked, they’re always immutable (so there’s only one of them), and if you need to edit them you stack a “Mesh Edit” (or a “Shape Key” or a “Material”) modifier atop them, which stores just the changes, based on the “checkpoint” system. Make it so that data can be dynamically added anywhere in the stack, rather than continue to rely on “local” vs. “non-local” data. And since it’s based on a modifier stack, if there’s a problem you just hit a button on the BMesh modifier to “find missing data” and point it at the right place. Since all the changes are stored inside their own modifier, once you point it at the right place, it will regenerate itself correctly.

I thought giving users the tools to edit “immutable” data was supposed to be the core of geometry nodes? Node tools, checkpoint nodes, etc.? Has that been abandoned?

Jonathan_Mafi · May 12, 2024, 7:20pm

This is an asset embedding proposal. The above proposal is low level and huge for solid anim and vfx pipelines. What you are suggesting sounds a bit more high level? Like its an application layer. That is to say i dont see why it cant be built on top of. There’s more than just geometry that gets linked in a production pipeline.

Having a “map” of sorts (which is what the hash is attempting to fullfill) achieves the ability to grab the “gist” of what encapsulates relevant data. Of course what goes in that map is very important. But there’s no reason why once that foundation is laid you cant build a builtin geo node specific thing as you mentioned.

For what its worth i build anim and vfx pipelines for a living on film and tv and one of the major factors in convincing me to move from M*** to Blender was the “local” vs “non-local” coupled with the datablock paradigm. Pipeline dev in Blender is so satisfying in comparison to the pain ive had to face in the past

MidnightArrow · May 12, 2024, 10:04pm

We were informed two years ago that what I suggested was on the planning table.

This proposal sounds like a Bandaid for the current, hardcoded system, which I assume will eventually get junked as Everything Nodes progresses. I’d rather see the new system worked on, since it solves the same problems in a more modular way. You could just add an Empty, and then add an “Embed Asset” geonode to it. Duplicate the Empty a hundred times, the geonodes all point to the same linked asset. You need to edit one, you stack an “Edit Mesh” node on top of it which allows you to edit the “checkpointed” geometry. And then if you link that node setup into a third file, the linked Embed Asset node will still point back to the original blend file, avoiding the problem of dealing with whether an asset has been modified.

This could also implement Dynamic Overrides in a local file too.

It just occurred to me that what I’m proposing is basically “Baklava, but geometry”. Animation layers, sculpt layers, why not geometry layers?

Gabi_love · May 13, 2024, 12:35pm

I really recommend you watching this video about New World Building Features | Inside Unreal.
They talk about the old actor handling system in Unreal Engine 4 and the new implementation with Unreal Engine 5 of “1 actor per file.” It really seems to bear a resemblance between the blender asset system.

brecht · May 13, 2024, 1:52pm

Overall this proposal seems fine to me.

@MidnightArrow This proposal is aimed at deduplicating node groups. You’re referring to local editing of geometry generated by nodes or brought in through linking, but that’s a different problem with a different solution.

For that the edited geometry would not be an embedded asset since it’s not coming from an asset library and there’s no need to deduplicate it. It would be data local to the blend file.

MidnightArrow · May 13, 2024, 6:32pm

The proposal really wasn’t very clear about that then, because the constant references to “asset” made me think of something like tree assets in a nature library. Node groups are only mentioned once, right at the end. If the main use case is allowing users to update their hair nodes to a better version, that could’ve been addressed upfront to reduce confusion.

But now this is starting to sound like technical debt from other areas. Like the lack of ability to inspect all of a file’s datablocks on a global level. I’d rather have that than a button that automatically updates an asset based on a hash value under the hood. I still think this sounds like a Bandaid on a wound that requires surgery.

But that’s just my personal opinion.

brecht · May 13, 2024, 6:48pm

This was posted in the Technical Feedback section, which is mostly for other developers to give feedback. The proposal is not to automatically update assets to a new version, but to avoid having duplicates when we know the assets are the same. Upgrading to a new version is mentioned as something that could be done at some later point, and that would need further design.

Obviously everyone has their own priorities for what they would like to see progress on, but it’s not really on topic here.

Jonathan_Mafi · May 13, 2024, 11:09pm

Super keen to see the thoughts and developments around what constitutes “same/similar” between assets :).

Sure you guys have already done a bunch of groundwork. Some initial thoughts from my side is a hierarchical approach opened up to the user end. An extension on what @jacqueslucke mentioned.
In a way the shallow and deep is a hierarchy of sorts. With deeply nested being the full hierarchy/structure. What if we either made the depth dyanmically set? Or have some presets based on standard 3d asset design.

Transform
- Mesh
  - Mesh
    - Material
      - … etc

Or perhaps have conceptual layered presets that can be defined by users e.g.:

Mesh Changes
Material Changes
Texture Changes
Animation Data Changes
…
UI Changes

In this way the user/production gets to decide what constitues “same”. An asset that has layer 1 - 3 as the definition of itself is considered a duplicate with any asset with matchings in layer 1- 3 as well, even if layer 4+ is different.

At full "strict"ness (maybe layer 20 or something) even the only difference of a slightly adjust node to a different x+y position in the graph would be seen as a unique asset not a dup.

Sorry its bit of an abstract discussion at this point. Hope that makes sense.

Downside is this has danger to becoming overly engineered maybe…

Raymond_Levi_James_S · May 14, 2024, 2:04pm

I’m pretty excited about a proposal with this focus, as it could help stabilize larger scale distributed animation productions.
One of the most important features that I can see is to allow an artist to embed delicate assets such as rigs and check/confirm the rigs stability in that scene/file per rig update, while still allowing other linked libraries to be intact.

I would also like to see if the user could specify specific datablocks (instead of specific libraries) to embed, as it would allow many more use cases.

One other thing thing that I would like to mention as an impacted project would be continued USD/USDZ support development, hashing each item and hashing its changes and tracking the difference between previous/current versions would be required for proper USDZ reading/writing. This could work via shallow-collections created from USDZ object imports that are hashed from their contents hashes (and auto/manually imported/exported)

The combination of these two could allow for a way to read/write/diff from a USDZ making blender much more compatible with the way that the film industry is leaning.

Afaik as an addon dev Blender currently does not expose a simple way to check for changes to allow diffing of individual files, and something like that would allow scripters to version control individual assets/datablocks with other programs such as git/perforce/svn.

Overall I’m really interested in this proposal!

Jonathan_Mafi · May 14, 2024, 9:23pm

I have found you can do minor checks for diffs on individual files using bpy.data.libraries.load but its all string based and doesnt allow great depth in the checking.

USD is an interesting one because in some ways USD already has an entire system for handling asset embedding and dups etc. It could be super useful to study USD under the hood to see if anything that can be learnt from it, good and bad.
I know they have their AssetInfo which is like a dictionary representation of the asset (similar to hash above) and then they have an assetResolution process which is to help resolve (find) the asset in actual paths.
As you say stabilizing larger scale distributed anim pipelines is a pretty big deal and USD has a great caching system for this! (caching + multithreading may be outside the scope of this…)

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations