DNA: Decentralization and C++

jacqueslucke · July 25, 2023, 8:10am

C++ Containers in DNA Structs

Using C++ containers that are specifically designed for DNA can significantly simplify code. In many cases it could completely eliminate the code to construct, copy, assign and destruct DNA structs. Also .blend IO code could become much simpler using functions like BLO_write_dna_vector. For many non-trivial structs, the IO code could even be generated entirely.

There are also some new difficulties:

The struct we use at run-time does not exactly match the struct stored in .blend files.
Some containers like DNAVector have multiple data members. We have to be able to control the names of the data members for file compatibility.

namespace blender::nodes {

struct NodeRepeatItem {
  DNAString name;
  short/eNodeSocketDatatype socket_type;
  char pad_[2];
  int identifier;
};

struct NodeGeometryRepeatOutput {
  DNA_VECTOR_DEF("items", "items_num", "items_capacity")
  DNAVector<NodeRepeatItem> items;
  int active_index;
  int next_identifier;
};

}

This could result in the following structs stored in .blend files.

struct NodeRepeatItem {
  char *name;
  int16_t socket_type;
  char pad[2];
  int identifier;
};

struct NodeGeometryRepeatOutput {
  NodeRepeatItem *items;
  int64_t items_num;
  int64_t items_capacity;
  int active_index;
  int next_identifier;
};

Questions:

Could we use enums in DNA directly now? If I remember correctly, we were not able to use them because the size of an enum is generally implementation defined. However, in C++ code we can give enums an explicit integer type (enum class MyEnum : int {};). makesdna.cc would also have to parse enums types, but that should be possible.
Should we strive for a solution that does not need explicit padding? In practice it seems reasonable that makesrna.cc could compute the padding itself. Static or dynamic inserts could be generated to ensure correctness.
If we don’t have a separate ::DNA struct, we should be able to differentiate between private and public data members. Private members should use the trailing underscore. Does it sound reasonable that we just strip away the trailing underscore in .blend files?
The memory layout of structs can change if we use types like DNAVector<>. That’s because it might have internal padding or it uses a larger integer type for the size than we used to. That shouldn’t affect compatibility, right?
Currently, we often use pointer+size in DNA where the data is actually a dynamically sized vector (like in NodeGeometryRepeatOutput). When we call something a Vector, it should probably have a size and capacity to support more efficient append, while an Array does not need the capacity. We could say that we only have DNAVector and not DNAArray. That will probably need extra versioning to initialize the DNAVector::capacity from the array size in some cases.
For compatibility, we need something like DNA_VECTOR_DEF that sets the DNA names used by the DNAVector that follows. Maybe we can also use a similar mechanism to decentralize dna_rename_defs.h?
We might need two versions of DNAString. One that is written as char * and one that is written as char[N]. This is required for compatibility. I already started working on automatic versioning between those types, but that’s not finished yet. Maybe DNAString and DNAInlineString<N> can work?

Run-time Data

Many structs have run-time data, i.e. data that is not stored in files but is only used at run-time. Currently, this usually works by having void *runtime (or similar) data member embedded in the DNA structs. This works but has a few downsides:

Requires an extra allocation for many structs (I haven’t seen that being a problem in practice yet).
Requires extra manual memory management.
The void * is written to .blend files which essentially just bloats it (I haven’t seen that being a problem in practice yet).

The nice thing about embedding the void * directly in the dna struct that is written to files is that a simple memory dump can be used to save any struct.

Below I show a few alternatives with different trade-offs.

Managed Run-Time Pointer

This approach is very similar to the existing solution. A void *runtime is still written to .blend files. However, the manual memory management at run-time is reduced.

namespace blender::bke {

struct bNodeSocketRuntime {
  /* ... */
};

struct bNodeSocket {
  /* ... */
  DNARuntime<bNodeSocketRuntime> runtime;
};

}

Embedded Run-Time Data

The goal here is to remove the need to for an extra allocation for run-time data and to avoid storing the unnecessary void *runtime in .blend files. The main downside is that file reading/writing does become a bit harder depending on the case.

namespace blender::bke {

/* 1. Separate DNA struct. */
struct bNodeSocket {
  struct DNA {
    /* ... */
  } dna;
  /* Run-time data. */
};

/* 2. Separate run-time struct. */
struct bNodeSocket {
  /* ... */
  struct Runtime {
    /* ... */
  } runtime;
};

/* 3. Struct Splitter. */
struct bNodeSocket {
  /* DNA members. */
  DNA_RUNTIME_BEGIN
  /* Run-time members. */
};

/* 3. Interleaved tagged DNA members. */
struct bNodeSocket {
  DNA_MEMBER bNodeSocket *next, *prev;
  DNA_MEMBER DNAInlineString<64> identifier;
  Vector<bNodeSocket *> linked_sockets;
  DNA_MEMBER DNAInlineString<64> name;
  bNode *owner_node;
  /* ... */
};

/* 4. Interleaved tagged run-time members. */
struct bNodeSocket {
  bNodeSocket *next, *prev;
  DNAInlineString<64> identifier;
  DNA_RUNTIME Vector<bNodeSocket *> linked_sockets;
  DNAInlineString<64> name;
  DNA_RUNTIME bNode *owner_node;
};

}

The first three approaches have the property that the struct can be split into the run-time and DNA part. The DNA part is written to .blend files, while the run-time part is not. Read-file code would have to invoke some special constructor that constructs the run-time data. The problem is that the splitting only works when one struct is written at a time. Things become more tricky when either e.g. an array of the type, or when it is embedded into the dna section of another type. In both of those cases, we would could still dump the memory of partial structs the performance of which could be perfectly acceptible with some preprocessing.

One could also restrict the ability to embed run-time data to structs that are always separately allocated. This is the most common case for structs with run-time data anyway. Other structs would have to use e.g. the DNARuntime<T> approach.

The last two approaches make it very explicit which members are DNA or run-time members at the cost of verbosity. To reduce verbosity one could add support using sections with e.g. DNA_RUNTIME_BEGIN and DNA_RUNTIME_END. Interleaving run-time data gives the most flexibility at run-time at the cost or more complex IO code. Especially properly constructing such a type after file-load might be tricky but should be possible. With some preprocessing, I would expect the performance of this approach to be good enough as well.

Naming of DNA and RNA

Personally I don’t see a strong incentive to change the names. I also don’t have an idea for a different name that I find equally succinct and easy to work with when working with other Blender developers. Not sure what others think.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund