DNA: Decentralization and C++

As we use more C++ and move to a more extensible architecture, the future design of DNA is one of the biggest unknowns. The goal of this document is to start a more concrete discussion about where we want to go with DNA and how to get there. For that purpose, there is a concrete initial proposal for how DNA structs could be defined in the future. Feedback that describes problems with the proposal or proposes other solutions is welcome.

Goals

There are two main goals:

  1. Working with DNA structs should feel more “natural” in C++ code.
    • It should be straight forward to define methods on them including constructors and destructors.
    • One should be able to work with containers in DNA similarly to other C++ containers.
  2. Use a more extensible architecture.
    • It should be possible to define DNA structs whereever the struct actually belongs instead of putting many unrelated structs into the same header.
      • For example, each node with storage could have its own header that defines the corresponding storage struct.
    • DNA structs should be able to exist in arbitrary namespaces, not just in the global namespace.

An important requirement for every solution is that the .blend file format must not change just to achieve these goals. Also we still want to be able to simply dump structs instead of having more complex serialization code.

Proposal

There are quite a few ways to achieve these goals independently and together. We already tried and use different solutions for better C++ integration, but there is no prevalent solution yet. Existing solutions include:

  • Define raw struct in makesdna and inherit from that in another type that has more C++ semantics (e.g. CurvesGeometry and bUUID).
  • Add methods to structs directly in makesdna (e.g. AssetWeakReference and bNode).

Below I show a new approach that achieves both goals and feels doable, even though it does require some more significant refactoring compared to the existing approaches. It’s easiest to show the new approach with examples. I’ll use node storage as example because it illustrates the points well. The same ideas should apply to all other structs.

Simple Example

As a first example we look at the node storage of the Mesh to Points node because it’s very simple. Currently, NodeGeometryMeshToPoints is defined in DNA_node_types.h as follows:

typedef struct NodeGeometryMeshToPoints {
  /* GeometryNodeMeshToPointsMode */
  uint8_t mode;
} NodeGeometryMeshToPoints;

typedef enum GeometryNodeMeshToPointsMode {
  GEO_NODE_MESH_TO_POINTS_VERTICES = 0,
  GEO_NODE_MESH_TO_POINTS_EDGES = 1,
  GEO_NODE_MESH_TO_POINTS_FACES = 2,
  GEO_NODE_MESH_TO_POINTS_CORNERS = 3,
} GeometryNodeMeshToPointsMode;

For better extensibility, this definition is moved to a new node-specific header: nodes/geometry/include/NOD_geo_mesh_to_points.hh.

namespace blender::nodes {

enum class GeometryNodeMeshToPointsMode {
  Vertices = 0,
  Edges = 1,
  Faces = 2,
  Corners = 3,
};

struct NodeGeometryMeshToPoints {
  struct DNA {
    uint8_t mode;
  } dna;

  GeometryNodeMeshToPointsMode mode() const;
};

}  // namespace blender::nodes

There are a few things to note here:

  • The NodeGeometryMeshToPoints struct is in the blender::nodes namespace. From the perspective of the .blend file format, the struct is still just called NodeGeometryMeshToPoints though.
  • Since this is a normal C++ header now, we can also use enum class to for better naming.
  • The actual DNA struct is embedded into the NodeGeometryMeshToPoints struct.
    • NodeGeometryMeshToPoints::DNA must be a trivial type, like any existing DNA type.
    • NodeGeometryMeshToPoints contains a dna member that is trivial, but doesn’t have to be trivial itself.
  • NodeGeometryMeshToPoints has a mode accessor that returns the enum value with the correct type.
    • Note that the method has the same name as the dna member. This is quite common and one reason for separating the pure dna and wrapper struct (enums can’t be used in DNA structs directly, because their size is not very well defined; maybe we can use them with C++ now though?) .

More Complex Example

The next example uses the “Repeat Output” node storage. It’s more complex because it contains non-trivial data. It has an array of items and each item has a name.

namespace blender::nodes {

struct NodeRepeatItem {
  struct DNA {
    char *name;
    short socket_type;
    char _pad[2];
    int identifier;
  } dna;

  NodeRepeatItem(StringRef name, eNodeSocketDatatype type, int identifier);
  NodeRepeatItem(const NodeRepeatItem &other);
  ~NodeRepeatItem();

  static void relocate(NodeRepeatItem *src, NodeRepeatItem *dst);

  void content_blend_write(BlendWriter *writer) const;
  void content_blend_read_data(BlendDataReader *reader);

  StringRefNull name() const;
  eNodeSocketDatatype type() const;
};

struct NodeGeometryRepeatOutput {
  struct DNA {
    NodeRepeatItem *items;
    int items_num;
    int active_index;
    int next_identifier;
    char _pad[4];
  } dna;

  NodeGeometryRepeatOutput();
  NodeGeometryRepeatOutput(const NodeGeometryRepeatOutput &other);
  ~NodeGeometryRepeatOutput();

  static void relocate(NodeGeometryRepeatOutput *src, NodeGeometryRepeatOutput *dst);

  void blend_write(BlendWriter *writer) const;
  void blend_read_data(BlendDataReader *reader);

  Span<NodeRepeatItem> items() const;
  CArray<NodeRepeatItem, int> items();
};

}  // namespace blender::nodes

Some notes:

  • Both structs have a normal constructor, a copy constructor and a destructor.
  • A move constructor or assignment operators do not exist. Instead, there is the relocate function.
    • Those could be implemented but it’s unclear if it is worth it for most types.
    • We generally don’t have a defined “moved-from” state for DNA structs.
    • Relocation is a move construction followed by a destruction of the old value. This concept mimics what we do with DNA much better. DNA structs are trivially relocatable in all cases I’m aware of, i.e. relocation is the same as memcpy (self referential structs are not trivially relocatable, i.e. when a struct contains a pointer to a value within the same struct).
  • File IO can simply be added as methods.
    • The content in content_blend_write means that the struct itself is not written, but only the data it references. This is a common pattern for structs that are usually stored in an array.
  • The non-const items method returns a new CArray<T, SizeT> type. This type is designed to wrap a C array with a pointer and size.
    • In this case the CArray contains a NodeRepeatItem ** to &dna.items and an int * to &dna.items_num.
    • The CArray has common methods for appending and removing elements. Changing it automatically updates the data stored in dna.

Challenges

There are quite a few challenges when implementing this approach. I could get Blender to compile with the simple NodeGeometryMeshToPoints struct defined in a namespace but the full implementation likely requires major refactors of makesdna and makesrna.

Some of the obvious and less challenges I found so far:

  • makesdna.cc needs to scan files outside of the makesdna folder.
    • To avoid scanning all files, we likely need to build list of all files to scan with cmake.
  • makesdna.cc needs to figure out which structs to analyse and which not.
    • With the proposal above, no extra syntax would be needed, because one could just search for struct DNA { to find all the right places.
  • The dna_rename_defs_ensure function that checks whether DNA renames defined in dna_rename_defs.h are valid needs to be changed.
    • It’s harder (and maybe unpreferable) to include all headers that include dna structs. It’s even worse if we want to be able to define DNA structs in .cc files or even at run-time.
    • Might be worth to consider to add information about renames directly in the dna struct definitions using macros or custom C++ attributes.
  • The way dna_verify.c works also likely has to change.
    • Same reason as above, it’s not a good idea to include all headers in one file.
  • makesrna.cc currently assumes that all DNA structs are in global namespace.
    • It’s likely a good idea if it knows which namespace each dna struct is in so that it can also generate forward declarations correctly.
  • It’s not clear whether the generated rna code should generally deal with the e.g. NodeRepeatItem or NodeRepeatItem::DNA.
    • Long term it might be better to use NodeRepeatItem because it’s more flexible and the DNA subtype really only exists to have very explicit serialization.
  • The DNA defaults system uses syntax that is not compatible with C++. It likely would have to change as well.

One good thing is that it should be possible to have all the existing DNA structs in the makesdna directory and only gradually move the structs elsewhere if it is benefitial. So it’s not necessary to refactor all existing DNA structs to make progress.

Next Steps

All of these challenges seem solvable given enough refactoring. But before starting with that it would be good to agree where we want to go exactly. The proposal above is not the only solution that achieves the goals. Other suggestions are welcome.

Another open question is whether we want to address decentralization of RNA at the same time. It might make sense when we have to refactor/rewrite makesrna.c anyway. My initial experiments suggest that the way we define rna structs can stay centralized even if dna is decentralized though.

5 Likes

Right after posting this I had another (more standard) idea for how to deal with the name collision of dna members and methods (like mode and items in the examples above).

struct NodeGeometryRepeatOutput {
 private:
  NodeRepeatItem *items_;
  int items_num_;
  int active_index_;
  int next_identifier_;
  char pad_[4];

 public:
  /* ... */
  Span<NodeRepeatItem> items() const;
  CArray<NodeRepeatItem, int> items();
};
DNA_STRUCT(NodeGeometryRepeatOutput)

The idea here is that just like with normal C++ classes we use the trailing underscore for private members. That solves the naming collision. The new idea is that when writing this struct to a .blend file, we remove the trailing underscore so that the .blend file format does not change.

There were other reasons for the separate DNA structs like:

  • Nice separation between the trivial and non-trivial part of a class.
  • The struct DNA { could be used to detect structs to be analyzed.

If name collisions are not a problem anymore though, then inlining the DNA struct might make more sense. We just need a new way to tag NodeGeometryRepeatOutput as a DNA struct, but that can also be done with a macro like DNA_STRUCT(NodeGeometryRepeatOutput) below the struct. Something like that might be needed in the future for other reasons anyway (like adding some meta-data).

3 Likes

This already works like this, but the list isn’t in the most obvious place (SRC_DNA_INC hiding in source\blender\CMakeLists.txt)

1 Like

Thanks for pushing this topic forward! I find it quite… unideal to make DNA and good C++ API/code to co-exist.

Some thoughts in random order:

  • Perhaps try leave RNA out of the proposal just for now. We’ll have enough fun un-entagling the DNA. More localized scope of design seems better in this case.

  • From your proposal i can see how makesdna could work. But it is not immediately clear how the readfile will. Like, what if there are fields outside of the NodeRepeatItem::DNA? Are they default-initialized?

  • Do we really need to parse C++ files? Can it be a more explicit declaration of fields (similar to how Cycles does it)?

  • It would be nice to allow storing non-POD in a file. I.e., to more natively save Vector<> without need to have dedicated pointer and size fields.

2 Likes

Having some concerns about getting access to the headers, right now everything for better or for worse lives in a single include folder, and when you need an actual dna struct, you can just #include "dna_mything.h" how is this expected to work in the future?

Or is the point of the decentralization not to give access to implementation details outside the project that defined it?

1 Like

Perhaps try leave RNA out of the proposal just for now. We’ll have enough fun un-entagling the DNA. More localized scope of design seems better in this case.

You mean that decentralizing RNA shouldn’t be part of the same project? With that I can agree. Implementation wise, we’ll need to change makesrna.c a fair amount anyway unfortunately.

From your proposal i can see how makesdna could work. But it is not immediately clear how the readfile will. Like, what if there are fields outside of the NodeRepeatItem::DNA? Are they default-initialized?

Ah, somehow forgot to mention that. In the current state of the proposal you can’t have members outside of dna. While that could work, it has some annoying implications like basically destroying the ability to use a memory dump to store dna structs (whenever you have an array of structs, or structs are embedded in other structs). If we remove that requirement, that opens up many more opportunities of course.

Do we really need to parse C++ files? Can it be a more explicit declaration of fields (similar to how Cycles does it)?

Not sure how cycles does it exactly, I guess you mean something like SOCKET_FLOAT(angle, "Angle", 0.0f);? To me that doesn’t really look better. And it also doesn’t work well with the IDE in my experience. Parsing structs directly seems more straight forward. Also it feels like the current makesdna.cc code makes it look harder than it is (at least for the part of C++ that we need to parse here, which is just some basic declarations).

It would be nice to allow storing non-POD in a file. I.e., to more natively save Vector<> without need to have dedicated pointer and size fields.

Supporting this essentially also comes down to removing the requirement that we can just dump structs in order to serialize them.

Having some concerns about getting access to the headers, right now everything for better or for worse lives in a single include folder, and when you need an actual dna struct, you can just #include “dna_mything.h” how is this expected to work in the future?

Or is the point of the decentralization not to give access to implementation details outside the project that defined it?

To get access to the structs, you would e.g. just include BKE_mesh.hh instead of DNA_mesh.hh. If you work with a mesh but can’t import BKE_mesh.hh something is probably wrong. It is intentional that it becomes possible to hide parts of dna behind proper APIs which allows for better encapsulation.

2 Likes

Allright that makes sense, i just wanted to be sure we weren’t gonna add all kinds of “alien” paths to bf_dna 's INC section

I love the idea of decentralizing DNA. Right now, adding new “things” in Blender requires changing too many files in too many scattered places. For devs who aren’t doing this constantly, one has to find a similar commit and copy it to remember everything that has to change. While that still may be true, it would be so much nicer if almost all of the changes happen in the directory where the main new “thing” code lives.

Some other random thoughts I had while reading this:

  • It would be a very big and well-considered/well-debated change to stop just being able to dump structures in a file. One of the things that delights Blender users is the fast load/store times.
  • We should develop and use a stereotyped set of C++ compile-time assertions about the DNA structs and their members that ensure we never accidentally make classes that violate Jacque’s rules (or whatever rules get settled on).
  • Is it time to rename “DNA” and “RNA”? These terms and concepts are mysterious and off-putting to new developers. Though they do have the advantage of shortness. But there are likely more intuitive names, such as “Persisted” (?).
10 Likes

That code is the (very basic) equivalent of RNA. But there is NODE_SOCKET_API(float, angle) in the class definition, which generates get/set functions that help automate update tagging.

It would be nice if depsgraph tagging could be automated by using setter functions everywhere. But I don’t see how to make it work since we wouldn’t have e.g. the object datablock pointer when editing a property on a modifier.

So I’m not sure there is an immediate use case for wrapping those member definitions in a macro.

If we control the data layout of these C++ types and don’t use virtual methods, we can dump structs. We can’t dump std::string, but an equivalent that is a char* wrapped in a class would work. Same for a typed ListBase, blender::Vector, std::unique_ptr.

We could use the same types as outside DNA, or have more compact types that leave out e.g. a cached size for strings or efficient growing for vectors.

1 Like

Thanks indeed for starting the discussion on this topic!

I would really rather have the initial proposal, with a very clearly defined set of DNA-to-be-written-in-file data, isolated in its own sub-structure. IMHO it is very important to keep a clear separation between runtime data and what is persistent across undo, written to disk, etc. Having a sub-struct here makes things even better than in current C struct system in fact, since runtime data can be completely excluded from the DNA struct then.

Not to mention that I would rather avoid juggling too much with names if it can be avoided.

Having a clear and easy access to DNA data is also why am not so keen at the idea of allowing DNA data all around our codebase - it makes it harder to find and manage. Although far from perfect, I like the current separation between the data definition (DNA and most of RNA), and the logic using it (BKE and editors).

I would indeed strongly advocate to keep the current ‘dump in memory’ capability of DNA data. As @brecht said, having our own ‘dumpable’ versions of standard C++ types can be done as needed. Not sure though what would be the consequences of this type of changes over compatibility, and also over tools manipulating .blend files outside of Blender (like the ‘BAT’ script used as part of Flamenco)? I guess we could only use these is new data for the time being?

I would also keep changes to makesrna as limited as possible for now, so think it should deal (by default) only with the ::DNA part.

I guess such nice things like automated depsgraph tagging, but also perhaps undo change tagging etc., would only be possible if all structs hold a reference to their owning ID? Similar to what RNA does. But such changes go way beyond this design/discussion.

This does not have to affect file compatibility, if makesdna consider e.g. the string class as a char*.

1 Like

C++ Containers in DNA Structs

Using C++ containers that are specifically designed for DNA can significantly simplify code. In many cases it could completely eliminate the code to construct, copy, assign and destruct DNA structs. Also .blend IO code could become much simpler using functions like BLO_write_dna_vector. For many non-trivial structs, the IO code could even be generated entirely.

There are also some new difficulties:

  • The struct we use at run-time does not exactly match the struct stored in .blend files.
  • Some containers like DNAVector have multiple data members. We have to be able to control the names of the data members for file compatibility.
namespace blender::nodes {

struct NodeRepeatItem {
  DNAString name;
  short/eNodeSocketDatatype socket_type;
  char pad_[2];
  int identifier;
};

struct NodeGeometryRepeatOutput {
  DNA_VECTOR_DEF("items", "items_num", "items_capacity")
  DNAVector<NodeRepeatItem> items;
  int active_index;
  int next_identifier;
};

}

This could result in the following structs stored in .blend files.

struct NodeRepeatItem {
  char *name;
  int16_t socket_type;
  char pad[2];
  int identifier;
};

struct NodeGeometryRepeatOutput {
  NodeRepeatItem *items;
  int64_t items_num;
  int64_t items_capacity;
  int active_index;
  int next_identifier;
};

Questions:

  • Could we use enums in DNA directly now? If I remember correctly, we were not able to use them because the size of an enum is generally implementation defined. However, in C++ code we can give enums an explicit integer type (enum class MyEnum : int {};). makesdna.cc would also have to parse enums types, but that should be possible.
  • Should we strive for a solution that does not need explicit padding? In practice it seems reasonable that makesrna.cc could compute the padding itself. Static or dynamic inserts could be generated to ensure correctness.
  • If we don’t have a separate ::DNA struct, we should be able to differentiate between private and public data members. Private members should use the trailing underscore. Does it sound reasonable that we just strip away the trailing underscore in .blend files?
  • The memory layout of structs can change if we use types like DNAVector<>. That’s because it might have internal padding or it uses a larger integer type for the size than we used to. That shouldn’t affect compatibility, right?
  • Currently, we often use pointer+size in DNA where the data is actually a dynamically sized vector (like in NodeGeometryRepeatOutput). When we call something a Vector, it should probably have a size and capacity to support more efficient append, while an Array does not need the capacity. We could say that we only have DNAVector and not DNAArray. That will probably need extra versioning to initialize the DNAVector::capacity from the array size in some cases.
  • For compatibility, we need something like DNA_VECTOR_DEF that sets the DNA names used by the DNAVector that follows. Maybe we can also use a similar mechanism to decentralize dna_rename_defs.h?
  • We might need two versions of DNAString. One that is written as char * and one that is written as char[N]. This is required for compatibility. I already started working on automatic versioning between those types, but that’s not finished yet. Maybe DNAString and DNAInlineString<N> can work?

Run-time Data

Many structs have run-time data, i.e. data that is not stored in files but is only used at run-time. Currently, this usually works by having void *runtime (or similar) data member embedded in the DNA structs. This works but has a few downsides:

  • Requires an extra allocation for many structs (I haven’t seen that being a problem in practice yet).
  • Requires extra manual memory management.
  • The void * is written to .blend files which essentially just bloats it (I haven’t seen that being a problem in practice yet).

The nice thing about embedding the void * directly in the dna struct that is written to files is that a simple memory dump can be used to save any struct.

Below I show a few alternatives with different trade-offs.

Managed Run-Time Pointer

This approach is very similar to the existing solution. A void *runtime is still written to .blend files. However, the manual memory management at run-time is reduced.

namespace blender::bke {

struct bNodeSocketRuntime {
  /* ... */
};

struct bNodeSocket {
  /* ... */
  DNARuntime<bNodeSocketRuntime> runtime;
};

}

Embedded Run-Time Data

The goal here is to remove the need to for an extra allocation for run-time data and to avoid storing the unnecessary void *runtime in .blend files. The main downside is that file reading/writing does become a bit harder depending on the case.

namespace blender::bke {

/* 1. Separate DNA struct. */
struct bNodeSocket {
  struct DNA {
    /* ... */
  } dna;
  /* Run-time data. */
};

/* 2. Separate run-time struct. */
struct bNodeSocket {
  /* ... */
  struct Runtime {
    /* ... */
  } runtime;
};

/* 3. Struct Splitter. */
struct bNodeSocket {
  /* DNA members. */
  DNA_RUNTIME_BEGIN
  /* Run-time members. */
};

/* 3. Interleaved tagged DNA members. */
struct bNodeSocket {
  DNA_MEMBER bNodeSocket *next, *prev;
  DNA_MEMBER DNAInlineString<64> identifier;
  Vector<bNodeSocket *> linked_sockets;
  DNA_MEMBER DNAInlineString<64> name;
  bNode *owner_node;
  /* ... */
};

/* 4. Interleaved tagged run-time members. */
struct bNodeSocket {
  bNodeSocket *next, *prev;
  DNAInlineString<64> identifier;
  DNA_RUNTIME Vector<bNodeSocket *> linked_sockets;
  DNAInlineString<64> name;
  DNA_RUNTIME bNode *owner_node;
};

}

The first three approaches have the property that the struct can be split into the run-time and DNA part. The DNA part is written to .blend files, while the run-time part is not. Read-file code would have to invoke some special constructor that constructs the run-time data. The problem is that the splitting only works when one struct is written at a time. Things become more tricky when either e.g. an array of the type, or when it is embedded into the dna section of another type. In both of those cases, we would could still dump the memory of partial structs the performance of which could be perfectly acceptible with some preprocessing.

One could also restrict the ability to embed run-time data to structs that are always separately allocated. This is the most common case for structs with run-time data anyway. Other structs would have to use e.g. the DNARuntime<T> approach.

The last two approaches make it very explicit which members are DNA or run-time members at the cost of verbosity. To reduce verbosity one could add support using sections with e.g. DNA_RUNTIME_BEGIN and DNA_RUNTIME_END. Interleaving run-time data gives the most flexibility at run-time at the cost or more complex IO code. Especially properly constructing such a type after file-load might be tricky but should be possible. With some preprocessing, I would expect the performance of this approach to be good enough as well.

Naming of DNA and RNA

Personally I don’t see a strong incentive to change the names. I also don’t have an idea for a different name that I find equally succinct and easy to work with when working with other Blender developers. Not sure what others think.

1 Like

Glad for reducing the influence of C and the gap in the code between the old and the new!
Are new DNA structures intentionally supposed to be bit-serializable? I mean, everyone structure can have a serialization method. If we are talking about speed, then the choice is with templates, and not with virtual methods. I would see something like:

namespace blender::nodes {

struct NodeRepeatItem {
  DNAString name;
  DNAEnum<eNodeSocketDatatype> socket_type;
  DNAValue<int> identifier;

  template<typename IO>
  void serialize(IO read_write) {
    name.serialize(read_write);
    socket_type.serialize(read_write);
    identifier.serialize(read_write);
  }
};

struct NodeGeometryRepeatOutput {
  DNAFor<Vector<NodeRepeatItem>> items;
  DNAValue<int> active_index;
  DNAValue<int> next_identifier;

  template<typename IO>
  void serialize(IO read_write) {
    items.serialize(read_write);
    socket_type.serialize(read_write);
    identifier.serialize(read_write);
  }
};

}

In this case, in the field DNAFor there will be just a template loop through all elements.
Also, if the structure has been changed, some calls can be added to the serialize methods to handle the versioning of the structure.
Although I seem to be missing something important in this topic?

Are new DNA structures intentionally supposed to be bit-serializable?

Yes, DNA structs used to be and are still intentionally bit-serializable. In theory that could be changed, and it would give us more freedom at run-time, but it also comes at a cost.

It’s not clear to me what the serialize method that you describe does exactly. We still have to write structs to .blend files because that’s how the format works and that’s not going to change.

1 Like

Seems we could as long as we can parse it with makesdna.

This is something we’ve wanted to do for a while, just haven’t gotten around to.

Right, that’s not a problem.

For most cases I imagine DNAArray would be fine and we don’t need the efficient append.

For a DNAVector, maybe file reading would always set size and capacity to the same value anyway without the need for versioning.

Alternative would be to do DNA renaming to match a naming convention.

Maybe FixedSizeString is more clear than InlineString, but it’s a bit long.

I don’t feel very strongly about this, would go with either Embedded Run-Time Data (2) or (3).