Computing `sizeof` a datablock

Background

Hi, I wanted to work on a tool for Blender to find out why a .blend file is growing in size. One wrong append, and suddenly you have packed textures that add 100MB, and it’s hard to notice this at a glance.

I started a prototype here, it looks like this:

However for a proper implementation (either as addon, or as part of Blender itself), I need some way to compute the “size that a datablock takes in memory/on disk” [1].

Technical Design Needed

As you can see in my prototype code, this computation of datablock size on disk is currently estimated and done manually. It has lots of pointers and nested structures to follow, much of it not yet implemented.

Ideally, Blender could provide some generic function that sums up the size of all “owned” data of a datablock in memory. So any references to other datablocks which are also in a “main” would NOT count (only consider an object’s size, but not its mesh data, or the materials, etc.) [2], but any datablocks embedded directly inside other datablocks would have to be counted.

Once implemented in a generic way, every datablock’s size could automatically be determined, and any non-datablock (like packed images) could be treated separately case-by-case.

I ask any core developers for feedback on how this implementation could be approached. Could this somehow be integrated into the .blend file writer logic? Or is there some DNA APIs which could help with this?
Or maybe what I’m looking for already exists and I just don’t know its name :smiley:

Thanks!


  1. So ideally this would not count runtime data, if that gets removed on save. The compressed flag of blend files should not be considered either. ↩︎

  2. In the UI, you could run the function recursively to sum up the total size of a datablock, if you want to view it in a tree structure for example. ↩︎

2 Likes

It’s a work in progress but the following PR may be relevant to you:

1 Like

Ah nice, thanks for that. I was aware of the PR, but didn’t realize it did some DNA refactoring as well.

RichSDNA in particular seems to have a “size_in_bytes” field, which might be exactly what I need (although I’ll have to check if it’s only computed on a “superficial” level, or if it includes data behind pointers, strings, etc. as well)

I think this would be useful, but there are some things to consider:

  • Data-blocks own some data and share other data. This can also include data that is not in an ID see e.g. BLI_implicit_sharing.hh. E.g. when duplicating an object that is a mesh, the mesh is duplicated, but all the attribute data is shared (until it gets written to).
  • Data-blocks have data that is saved to the file and runtime data (which one should be included/excluded in the count? Should users know about this difference?)
  • Data-blocks can be original data or evaluated data. Evaluated data can have a completely different size (smaller, larger, same) since it included e.g. the results of the modifier stack.
2 Likes

Yeah I assume there’s many many valid sizes you could compute, which all vary depending on what you include and exclude in the count. Not sure if all cases can somehow be combined into one single API, or if the caller of the API would have to manually sum up multiple results and ensure there’s no duplicates by themselves.

At least for the purpose of “user wants to know why their .blend file is big”, I assume

  • the sum of all sizes should not be too far off from the size of the uncompressed .blend file (excluding compression, headers, etc.)
  • it should count data that is de-duplicated or shared only once
  • only original data and non-runtime data matters to the user

But of course there may be other use-cases that are looking for different methods of computing size (e.g. analyzing RAM usage).

If designing a general-purpose sizeof would be too difficult architecturally (due to the many possible options), I’d also settle with doing a purpose-built implementation for this feature, which could later be extracted if it’s needed in other places. But for that I’ll still need a high-level approach on how to traverse the ID data, determine ownership of the data encountered, etc. which I’ll need to investigate.

Not sure such api should be, too hard to strictly defined what need to be counted and what should not be. You always can just try to delete thing and see impact of that.

You cant separate memory usage of original and evaluated data.

You should not be able to know how data is stored in memory (array of values, single value + size or some sparse storage).

Alignment? Overallocation? Stack usage vs stack size?

For me it’s not about memory, it’s only about size on disk.
Which I could in theory compute by saving a .blend file with just 1 datablock (and its dependencies), but I want it to be fast and cheap to compute and not run the whole serializing logic…

1 Like