Function System

This thread can be used for questions and feedback regarding this document about the “Function System”. It is about how user-defined functions (defined using nodes/expressions/…) can be evaluated efficiently.


I don’t understand this:

I evaluate the functions on 10.000.000 elements. The C++ code takes approximatly 60ms and the user-defined function 160ms to execute. That are 6ns and 16ns per element on average. There is no compilation happening at run-time. Personally, I think this result is quite good already.

Why do you think that result it’s quite good? it’s more than double than the C++ function, I find it poor to be honest, but I may be missing (or not understanding) something or you know something more that will speed this up?, because when we have to deal with situations that can take several minutes, we are talking of going from 5 minutes to 13.3 minutes.

I’m curious, do people really make Sims with 2 billion particles? I’m genuinely asking unless I did the math wrong Or I misunderstood.

in order to reach a 5 minutes sim, considering the numbers given 160ms multiplied times 187.5×10,000,000 =1,875,000,000 objects that’s the equivalent of 5 minutes sim

a 13 minutes would be more than 4,875,000,000 objects . Maybe it wouldn’t scale linearly?
I admit I’m hopeless in math though

Edit: time of simulation not amount of particles . Got it sorry

I knew that this sentence would trigger you :stuck_out_tongue: Your calculation is correct. Given the generic optimizations I implemented so far, the system is about 3 times slower than precompiled C++ when only low level operations are done per node (like additions and multiplications). This is actually the worst case for the system (and for any system that wants to compute user-defined functions), because it is mainly measuring the overhead of the system itself. I did other tests where the operations are a bit more expensive operations and found that the overhead becomes more negilible very quickly (e.g. when using a sine node).

You are missing that the benchmark just shows “where we stand” and that the system is optimized for optimizability. So instead of having implemented every optimization I could think of immediately, I designed the interfaces so that I can implement many optimizations later on. Currently, no individual node is optimized. Every optimization I implemented is generic and applies to many nodes. The benchmarks show me what I have to optimize to get on par with native C++ performance and what not. So no, the current performance is not the end of the journey. You can believe me when I say that I want this system to be on par or even faster than corresponding C++ code (it is of course always possible to write a C++ function with the same performance, but I’m talking about the code that one would write 99% of the time). LLVM will be part of the solution. The main contribution so far is the interface for throughput optimized functions. I hope that answers your question.


Please don’t take me wrong :slight_smile: you know I only want the best for this, I’m amazed with what you are doing here and I just want to understand :slight_smile:

Thanks! I suspected that but I wanted a confirmation from you to understand where are we heading, this is going to be awesome :slight_smile:

Keep in mind that we were talking about a simple operation, if we consider complex operations, like a custom physics solver made with function nodes, there we could expect a much slower performance, hence I’m not worried about that specific situation but a more production realistic situation with much more complex functions.

Also we don’t use 2 billion, but we need around 100 million particles to properly simulate a small square of deep sand, so being able to handle quite a few millions is very important, the same goes with water foam, to have proper foam we need a few millions.

And anyways the idea is to have the maximum possible performance to avoid having to create custom code that has to be compiled within Blender to generate custom solvers for example, this will simplify part of the Fracture Modifier implementation :slight_smile:


Reading the document, I’ve got the impression that the functions will be saved in the blend file…

Would it be possible to extract them from the blend file so that we can make independent libraries of functions? I’m just thinking that the users may want to re-use their functions in other projects or share them. Functions could have their own file format. One function or multiple functions could be contained in one file to form a module as in python.

That might be related to the asset manager; cause you could also want to make an external library of animations, with one file per animation, or bundles of animations…

I would assume that it would work like any other blender data. IE shaders- if a user creates a super cool shader and want to reuse it, they save it to a blend file and append it in any other scene that needs it.

Can function system have conditions?

data = fuctnion1(input)
if False:
    data = function2(data)
data = function3(data)
output = data

@Nodragem Node trees that define functions will be stored in the .blend file and can be linked/appended to other .blend files. However, it is unlikely that you will be able to use those functions outside of Blender. At least in the current design.

@Random There is a switch node that selects one of two inputs based on some condition. Currently, this case is not optimized (so both inputs will always be evaluated). I’m aware of two different optimizations I can implement later.

1 Like

@jacqueslucke Do you mean something like this:

data = fuctnion1(input)
data2 = function2(data)
data = function3(data if True else data2)
output = data

And optimization will be: not to call function2 if it does not any effect?

1 Like

@Random That’s correct.

1 Like

@jacqueslucke What about loops? Would it be possible to use something like loop For? Like this:

output = 0
for i in range(10):
    output += power(i)

Or loop while:

output = 2
while output < 100:
    output += power(output)

and nested loops:

output = 0
for i in range(10):
    for ni in range(i):
        output += power(ni)

I did a mock-up for a proposed iterative node group over on Right Click Select – this is for material nodes, but it might offer some insight into how to design the UI.

1 Like

The system only handles data flow for now, no control flow. So, loops are not possible yet. I can make it possible in the future for sure though.

@Josephbburg Thanks for the mockup. For Animation Nodes I also thought about how loops can be represented with Blender’s node system.

1 Like

It’s not quite clear if the proposed system can extend (or replace?) existing shader node system.
Seems like it’s totally impossible without rewriting half of blender core and cycles.

Replacing the shader node system for Cycles and Eevee is not a target.

I would love to see the ability to have node-trees interact with unlike types of node-trees. For example, maybe in the future, if there are rigging nodes we’d want them to have access to everything, for the purpose of control. This is one of the few features I really like about Maya-- being able to connect any two nodes in a variety of ways.

@jacqueslucke Did you ever make a mockup of how loops could look for Animation Nodes? Have any feedback about the mockup I made?

1 Like

I’m not perfectly sure, but I think that loops (of non-static length) require some state in one form or another. And the state is not expressable in stateless dataflow. It’s just on another level of computational complexity. Combined with ‘switch’ node it becomes possible to implement full FSM.

How do Maya or Houdini address this issue?

There are no loops in Maya, and simulations are run inside black boxes (one single big node). Houdini has a special kind of node called solver in which you can dive and access previous frame information, bit I haven’t used it a lot, I couldn’t elaborate.

Here is the documentation for how loops work in Animation Nodes:
Afaik, other node systems (e.g. Sverchok and Sorcar) also have some way to represent loops. I could not find any documentation though.

Your proposal is missing a lot of details that are important. For example, how does the loop know which nodes represent the iteration rule? Is it all based on names of nodes and frames?
Furthermore, loops add a lot of complexity to a node system and I’m not sure if it is worth it to add this functionality to Cycles nodes. Loops with an undetermined iteration count could also make it much harder to run them on the GPU. Lastly, this topic is not about Cycles nodes, so let’s stay on topic, please.

1 Like