Unified Simulation System Proposal

I did some experimental work last year in offline texture generation (similar to substance designer), and also ran into llvm achilles heel there , the quality of the code it generates is great, there’s no argument there but getting that great code takes a little bit.

Once a shader graph was compiled the run time perf was fabulous, tweaking the parameters of a compiled graph you’d have a great time and you could rapidly iterate. However the initial compile (or recompile if you change a single node link) was easily taking a over a second for even the most basic of shaders. making the thing just utterly frustrating to use and worse the load times of a file when you had a bunch of these things shot through the roof.

I think that fabric guys also bumped into this cause otherwise they would not have settled on a two stage rocket (from the pdf linked by @StefanW)

Two-pass compilation
– First unoptimized compilation pass
– Fully optimized code generated in background

Not saying llvm is bad choice though, just confirming that the issues @jacqueslucke had are not an isolated event and that if we choose to use llvm runtime perf is not the only metric we should be looking at, and should probably deisgn keeping in mind llvm can be slow at times and try to mitigate that behavior from the start rather than at a later stage when we run into issues.

4 Likes

How do you plan to deal with user developed nodes?

For example I spoke with @scorpion81 about using this as the new system to implement FM in the future, but would it support custom programmed high performance nodes or will we still require a full blender compile to get new nodes/functionality working? (Ím thinking in some physics solvers or geometry solvers)

I also agree with avoiding the random execution as @StefanW said.

Great plan :slight_smile:

1 Like

This is one area of interest to me as well.

I work in the automotive assembly line with industrial robotics and CAD packages.

In my daily work we have many things which need to be developed that are, conceptually at least, equivalent to your definition of Simulation “Operation” Nodes.

Just the industrial robotics sector alone is a deep enough concept to span multiple problem domains. For instance, we have kinematic solvers that must be accurate to a tenth of a second to the real robot’s motion, and the implementation for each of these is vendor specific.

My best idea looking through this proposal would be a user-defined node that would create an api endpoint (possibly http) for the industrial robotics api to post joint values (in yaw pitch roll expression) for Blender.

Otherwise we would need a comprehensive development environment similar to what we have in our current commissioning packages, complete with custom robot logic parsing engine… there is a lot of depth to cover.

However, if we could author a node which could read the robot logic file and interpret using Python (which could be possible) then we may be able to explore Blender better there.

However, much of my reading has to do with Physics and mesh simulation in this proposal. Is there any proposal for logic nodes, similar to what we saw in BGE (on state change, AND, OR, XOR)?

Also would there be any provision for multiple simulations that run concurrently and how would the user specify that? I don’t think that making that judgement call implicitly would be such a good idea. It would need to be a user decision. We typically consider multiple robots running in parallel separate simulations even if the robots are in the same physical space (and therefore sharing phyics primitives in the software) because in reality the logic calculation and inverse kinematics solutions occur on separate hardware.

Anyways that’s my 2c from a domain that, admittedly may not be the best suited by this proposal.

1 Like

I read through the proposal regarding evaluation order and two things about how Houdini
works seem relevant.

The output node of a Sim is marked with the orange output flag so there can be one and only one end point of the node tree. The second thing is tree traversal always follows up the tree and then left most input first so execution order is explicit. The Merge nodes let you easily reorder their inputs by moving the input up a stack just like how Blender modifier stack reordering works.

Unfortunately, even the unoptimized compilation using LLVM is too slow in many cases. I don’t like to add multiple seconds to the file loading time that is spend in code we don’t control…

A good short, medium and possibly long term solution seems to let users implement an operation via a Python interface. The user would have to implement a Python function like my_operation(state_objects: Set[StateObject], time_step: float). A StateObject instance has methods to access the key-value-storage of a state object. Access to underlying arrays has to be provided using Pythons buffer protocol. That will allow them to be efficiently processed by e.g. numpy or custom C/C++ functions loaded at runtime by an addon. Other, less trivial, value types need specialized Python wrappers, but that seems fairly doable. Using Python does introduce some additional locking when evaluating the depsgraph on multiple cores. However, since e.g. numpy and custom C/C++ functions can release the GIL while expensive computations are done, this is probably not a big issue.

I was thinking about the fracture modifier in the design process as well. I think it can fit into the proposal quite well. To a large degree this is because geometry stored in state objects can be (don’t has to be) decoupled from Blender’s Mesh data structure. Furthermore, a single state object can encapsulate a dynamically fractured object if necessary.

The only purpose of the node tree described in the proposal is to schedule operations on state objects. It does not propose a logic system yet. It might be good to have more domain specific node tree types for some use cases. For example, the behavior of agents within a crowd simulation could be defined in a separate node tree that is referenced in the simulation tree. It is the job of the solver to make sense of that other node tree. The following is just an example for what this might look like. I have not thought about crowd simulation more deeply yet.

I can think of two ways multiple concurrent, but separate, simulations can be modelled with the proposed system. I’ll make the descriptions below specific to your example.

  • All robots are simulated within a single simulation. That means, that a single state object contains the state of multiple robots. The solver would update attributes of all robots in a single solve step.

  • Every robot is simulated in a separate simulation. That means, every simulation has one state object for one robot. Node groups can be used to share common nodes between all the simulation node trees.

For visualization, the simulated data can be imported to one or multiple separate objects regardless of which of the two approaches is used.

4 Likes

Under the assumption that we have a single output and that the inputs to a merge node are ordered, a deterministic schedule of the Apply Operation nodes can be found with this algorithm (given in pseudocode that I did not actually test):

def schedule_apply_operation_nodes(output_node):
    if not output_node.inputs[0].is_linked:
        return []

    scheduled_nodes = []
    schedule_apply_operation_nodes__impl(output_node.inputs[0].origin_node, scheduled_nodes)
    return scheduled_nodes

def schedule_apply_operation_nodes__impl(current_node, scheduled_nodes):
    if is_merge_node(current_node):
        for origin_node in ordered_linked_origin_nodes_of(current_node):
            schedule_apply_operation_nodes__impl(origin_node, scheduled_nodes)
    elif is_apply_operation_node(current_node):
        if current_node not in scheduled_nodes:
            input_socket = current_node.inputs[0]
            if input_socket.is_linked:
                schedule_apply_operation_nodes__impl(input_socket.origin_node, scheduled_nodes)
            scheduled_nodes.append(current_node)

The numbers in the Apply Operation nodes indicate the order in which they are run. Note, there is one node that is not run at all. This aspect of the algorithm can be used to “mute” a set of apply operation nodes by not connecting them to the output.

3 Likes

I can definitely see that being the de-facto way of doing logic nodes, but my only worry would be that in my domain of robotics and virtual commissioning there can be many, many thousands of signals.

What we tend to do for robotics and virtual commissioning is very, very simple and our use case would be very strongly supported for debugging purposes by the visual node type system proposed here.

Although I have no place saying what must be in it, I can only strongly encourage that logic nodes be considered for the proposal due to how powerful they could be and the fact that Python really would not be necessary for most cases. For things like specific IK solvers and the like, a custom node would be required, but for 98% of what we do in robotics and industrial controls validation (evaluate boolean and play an animation / apply force-torque based on the result) having boolean operators would be amazingly powerful.

100% agree with this. In fact, much of the logic we use daily is canned (copy and paste) logic so I was wondering what provision there would be for instanced logic (copied node maps maybe with less overhead?) and nodes which themselves contain nodemaps. The latter would be very powerful for us in that we could make precanned logic and provide it in a node that contains a nodemap that only displays a comment. This would allow us to make a general description of what complicated logic might do for those not in the know, but would not take up nearly as much space visually.

I love this part of the proposal in particular because it really does emulate the types of things we do very often. We have a lot of things that work on state, as you say “muted” in that they are simple monitors, debug, etc and don’t output anything.

Is there going to be a provision to append simulations or node maps (which I guess is just my word for a specific subset of a simulation) from one Blender file to another the way we can specify a library Blender file that appends to projects today with the append function?

Sorry, hope any of that made sense. I understand that I am speaking as an advocate for a very specific problem domain but I think these problems (boolean filter, simulation append, simulation-in-simulation) are pretty general, at least in the abstract. I think a lot of areas could benefit from their inclusion. In this way we would make a lot of this pretty similar to Cycles nodes and Animation nodes (at least I think).

Thanks for the awesome progress! Final question, are these actual screenshots from the interactive branch? Or just mockups?

1 Like

Few things to consider (not sure if you’ve got these covered already):

  1. Variable time steps. System could support changing length of time steps that are simulated between frames. Maybe specify via curve like in graph editor? Anyhow, this would be useful for e.g. switching from normal speed to slow motion. Maybe even allow negative time steps?

  2. Maybe consider to support interdependent simulations. By this I mean coupled simulations, which need to pass and map information between simulation domains, e.g. fluid-solid interaction. For this there is PreCICE library which may be useful.

  3. Support for level of details. Some way to affect how coarse or fine simulation is wanted (quick preview sim, medium level sim or slow very detailed sim). Variable spatial definition of LODs somehow?

  4. Support for parallel simulations (both among simulators and within each simulator, definition of how to access and divide local and remote computational resources).

@TylerGubala

The thing with “logic nodes” is that they only make sense for some kinds of simulation. I’m currently focussing on the parts that are common to all kinds of simulations. That does not mean that we can’t have a logic system (whatever that means exactly) for the simulations that need it. For example, the particle system I’m developing has events, conditional execution etc. I intend to extend the proposed “node tree syntax” to allow users to use such nodes in the same node tree, but that is not part of the generic system. Not sure if that makes sense…

I think I do not fully understand your use case. It might help when you write a separate document that explains your use case in a concise way and link it in this thread.

You can append simulations and node groups.

These are actual screenshots of mockups. I made them in a branch of the functions branch, which I’ve only uploaded on Github currently.

I think “variable time steps” has two different meanings. You can change time steps to compromise between accuracy and computation time, or you can change time steps to change the speed of the simulation. Based on what you describe, you are probably only talking about the second meaning. Letting users change how much simulation time passes per frame should be quite easy. I’m quite sure that we will not allow negative time steps. Simulations should be simulated forwards and then played backward to achieve that effect.

I did not know that library, will check it in more detail. In my view, interdependent simulations are actually a single simulation with a solver that can incorporate multiple subsolvers. So they should fit into the current design without bigger issues (the difficulty is to actually implement such a solver).

This can be achieved by having a top level parameter in a node system, that controls different settings in the simulation. For example, you could build a node system that, depending on some boolean value, uses a low resolution model or a high resolution model in a rigid body simulation. We can probably make this more user friendly by providing good prebuild node groups (+ optionally a separate UI), but it does not need to be part of the fundamental design of the framework.

I’d say that this is mostly an implementation detail. There are many things we can do to allow for parallel simulations when actually implementing this system though. Most importantly, operations would have to specify what data they read/write in advance. This allows a scheduler to figure out what can be parallelized.

1 Like

Maybe not if you use a neural network to manage the results as acquisition from copied states … and then pass them on to a integrator like an AI module that can be specialized regarding the type of simulation… …

@YvesBodson I’m quite sure that we will not use a statistical model for this part of the process.

Regarding LLVM:

I see. Would be interesting to investigate how SPI got their LLVM OSL to be faster than their C++ shaders though. I still think it could be useful in the future to not use unrestriced C/C++ for simulation nodes but use only a subset of the language or a DSL. That would keep the door open for GPU evaluation (GLSL, SPIR-V, Metal, maybe ISPC on future Intel GPUs). Disney’s https://www.disneyanimation.com/technology/seexpr.html might also be worth a look.

1 Like

If used correctly, LLVM will probably provide the best run-time performance on the CPU. So it is not really surprising to me how compiled OSL can be faster than precompiled C++ shaders. To me, LLVM and GPU evaluation are not “the solution”, but they can certainly be part of the solution for sure.

For the rest of Blender, a simulation is just a function that runs on the CPU like any other. However, it should be possible to have State Objects that reference data on the GPU and Operations that, when executed, invoke some processing of that data on the GPU.

Btw, every node system is a DSL as well. For now I don’t see any reason for why one should not be able to create Operations with C/C++, Python, nodes or some other domain specific language.

1 Like

I’m thinking about graph optimization - removing redundant nodes, merging duplicates, etc. If we can reason about what’s inside the nodes and know that they don’t have side-effects, the runtime can perform those optimizations. If on the other hand, the nodes are black boxes that can do anything (write to files, change the scene graph, etc), then it’ll be close to impossible for the runtime to figure anything out, whatsoever.

For example, it we can isolate independent branches in a node graph, we can potentially evaluate those in parallel. We can only do this for nodes that are known to not be writing to the same memory, obviously. Since we can’t make any guarantees about black box script nodes, this optimization would fail with those nodes.

I was more thinking of possibly evaluating the entire node graph on the GPU, not individual nodes (see Cycles). In a mixed environment, it could end up so that GPU->CPU memory transfers otherwise become the bottleneck and cancel out the benefits of GPU processing altogether.

Maybe it’s just my Cycles-centered mind, but I think it’s a big advantage that we can run our shader graphs on a variety of devices at full speed, while still running all the expensive operations on the CPU as compiled C code and not some interpreted language.

I’m doing many of these optimizations already. While a “function node” itself is a black box, this does not mean it is allowed to do everything. I talked about this topic in my very first document about Everything Nodes (note that I wrote this more than a year ago and some aspects of it are not up-to-date anymore):

Similar constraints can be used for Operations. E.g. an operation is only allowed to modify the data that is passed into it.


We are talking about two different node systems right now: Simulation nodes and Function nodes. Simulation nodes are a DSL that allows users to schedule operations on state objects. Function nodes are a DSL that allow users to model data flow with inputs and outputs. Both can have different evaluation mechanisms.

Cycles only has “function nodes”. It makes a lot of sense to take an entire node tree of this kind and compile it to run natively on the CPU and/or GPU. I’m actually very interested in working on this topic. Things become a bit more tricky when you want to allow users to work with lists of data or other more complex data types like strings or meshes though.

For Simulation nodes, I’m not yet sure if it is necessary to take multiple of them and compile them down to a single function. Maybe it is. The GPU-CPU memory transfers can be reduced by storing references to data living on the GPU in the State Objects.

We can also compile a chain of low level Operation nodes into one function that runs on the GPU and run one Operation that is written in Python afterwards. Saying that every function and simulation node has to be able to run on the GPU would make many use cases much harder or even impossible to achieve.

2 Likes

I wrote another document about how I’m currently evaluating user-defined functions. This other thread can be used for questions and feedback.

3 Likes

Yes. Here is a hypothetical use case for supporting zero and negative time steps: Simulation of snow flakes and cloth being affected by wind and gravity (e.g. canvas dropping in air). For artistic purposes, you want to affect canvas time rate: first slow down forward time stepping, then keep it zero (immobile canvas), and then make it follow negative time stepping (causing physically wrong but artistically potentially wonderful effect), while keeping snow flakes simulation in constant forward time rate all the time. If it would be possible to support this kind of variable time stepping, it might allow interesting simulations in future. I think initialization on simulation system could be done via separate function call on first frame where simulation is started.

4 Likes

Say we wanted upbge to have a simulation block - is there some sort of way to loop the simulation system?

#to turn the sim system into a gameloop**

So is this still active and under development?
Is there simulation frameworks / libraries that are considered to create this unified simulation system?

Interactive Computer Graphics

they have “SPlisHSPlasH is an open-source library for the physically-based simulation of fluids”
and PositionBasedDynamics project “PositionBasedDynamics is a library for the physically-based simulation of rigid bodies, deformable solids and fluids.”

taichi-dev/taichi_elements: High-performance multi-material continuum physics engine in Taichi]
github . com /taichi-dev/taichi_elements

zenustech/zeno node system in github . com /zenustech/zeno

in github . com/ nepluno/
there is many interesting solutions: libwetcloth, libWetHair, pyasflip, lbfgsb-gpu

There is also this one developer who has posted own progress from selfmade simulation engine: Realtime GPU smoke simulation - Other Development Topics - Blender Developer Talk looks pretty interesting - could blender foundation contact to this developer if developer would be interested to join blender dev.?

And then there was also this Vadere called open source simulation framework for corwds.

+PhysBAM
+Chrono Project

Sorry if i wasted your time.