Noob question - Does Cython brings similar performance to pure compiled C code?

Here is my question.

So if we use Cython things get compiled, does this brings the same performance than actual C++ code inside Blender?

Is there a way to make an addon to perform the same as if it were coded and compiled inside Blender?

And a final noob question, does Python or Cythong give multithread performance automagically?
(I´m not speaking about manually opening a second thread, but to make an addon leverate 32 cores for example for it´s calculations)

I ask all of this because I want to understand a bit more how can an addon get the maximum performance in general.

Cheers!

Allow me to add one situation were I think performance is pretty bad… file importers.

FBX importer is slow, OBJ importer is slow, DXF is slow, in general importing complex files is slow, and I wonder if it could be faster and multithreaded using something like Cython.

Not that I can program it, but I´m wondering this because for some things more performance would be very welcome :slight_smile:

Cheers!

Theres 3 parts to a importer… Reading the base file, figuring out what to do with each line, and finally doing the resultant action in blender.

Cython, may help with parts 1 and 2, but wouldnt help with part 3 as that is internal blender actions. Python and Cython both do not give multithreaded performance magically.

OBJ importer is actually quite fast, if you disable split geometry (change it to keep vert order). This is because blender slows down exponentially for each object created. Speeding up the processing time for creating objects, and also how blender handles large amounts of objects, probably would help it more then trying to multithread parts 1 and 2.

1 Like

What I tend to see lately is “X is slow, lets thow multi-threading at it” even though it may not be the best solution for a given problem. I wrote a quick and dirty obj importer in C++ recently (not quite as full featured at the python importer, but it got the job done)

importing sanmiguel low poly from http://casual-effects.com/data/index.html (628 megs)

c++     11.2sec peak memory  1.7gb
python 321.9sec peak memory 10.3gb

single threaded, and it was mostly IO bound, throwing more threads at it would quite possibly negatively affect performance…

5 Likes

Multi threading is a difficult challenge and nothing can do it “automagically”. It takes a lot of thought and effort to do it properly, otherwise you risk making things even slower if you do it poorly.

Cython can indeed be competitive with C/C++ under the right conditions. But most blender Python code is just calling blender functions already written in something very fast like C*+, so it’s not necessarily worth the extra effort required to write and compile Cython code

1 Like

(I’ve never actually written an importer for Blender, but I know who to make addons fast)

As already mentioned there are three parts to an importer.

1. Loading the file
When you just load the file into some byte buffer you should already get very good performance with pure Python. It is unlikely that this is the actual bottleneck.

2. Processing the loaded buffer
This is the part where Cython can shine. You won’t get multithreading for free, but it can certainly be implemented.

3. Writing the loaded data into Blender data structures
A benefit of using Cython for the processing is that you already have all the data in buffers. Then Blenders foreach_set function can be used to copy the data to Blender meshes fairly efficiently.

Summary
Yes, in many cases Cython can be used to make importers as efficient as if they were implemented in Blenders C code. However, it is much more work than just writing Python importer. When writing high performance Cython code, it is more like writing c code anyway. Even more so when you want to use multithreading.

2 Likes

Thanks everyone for your answers!

You clarify a lot of things :slight_smile:
I know that multithreading is something complex, that is why I as with the word “automaGically” hehehe because I know it´s hard in general.

But yes, my question was more related to performance itself, so, for example the performance benchmarking from @LazyDodo is exactly the kind of problem I see, and the thing is if Cython could improve this, and make the importer as fast as if it were written in C++ (or similar as @jacqueslucke says) , then it could be worth learning it an writing addons in Cython, I don´t care if it´s harder… it´s programming, harder = funnier hahaha

I would like to see some data regarding that performance gain, becaue many people discards doing an addon or an implemetnation to other software because they say that pythong is too slow to implement their plugin with it, so I´m curious about that.

Thanks again to everyone!!!

Cython is transpiled to C/C++, after some time you can learn how to get rid of all the Python overhead (which is necessary of you want to release the GIL to be able to benefit from multithreading). If done right, Cython code can have the same performance as normal C code. Cython can also be used to make thin wrappers for .c files, that way you can use plain c code in your addon.

The performance gain compared to just Python can be very large. In some cases I measured a 400x performance increase in Animation Nodes, however this highly depends on what you are doing and if you are doing it right.

Cython comes with a good annotation tool that can show you where most of your Python overhead is.

3 Likes

That is very cool to know!

Thanks for this, now I can properly talk to some devs so they try Cython for their addons.

BTW @LazyDodo , that trend of “it´s slow, can you multithread it?” it´s from the Core2Quad days hahaha
I understand that no everything can be multithreaded, that is why I was asking as a general thing first, but maybe an importer can be multithreaded in case there are several objects to process in the file or something similar.

Thanks!

Cheers!

The problem here is that in almost all cases you should not modify Blender’s data from multiple threads.

Having said that, starting to use Cython makes distributing addons much more painful. Also the addon will get quite a bit larger. That are very important points to consider. Personally I probably would not start a new addon that uses Cython because of that overhead.

If I really need c/c++ performance I would just write C/C++ code and use Pythons ctypes module to interface with it.

@BYOB: that is right. Gladly using foreach_set with buffers (not just python lists) usually makes that part quite fast (when you have large meshes instead of many small ones). One could also use a modal operator to load everything in multiple steps, so that Blender is not lagging.

2 Likes

Because no one mentioned it directly: Often enough, it is not about the programming or scripting language you use or on what hardware you run it. The most important thing is the algorithm.

If the algorithm which can solve the problem has a bad runtime complexity, then it does not matter if you write perfect C++ code or a sloppy Python script. If the time it takes to compute the result grows exponentially to the input size, then it might work for very small datasets, but you will reach the point at which it will take forever (and that is not an exaggeration) very soon, even on the best computer in the world. This is likely to occur for algorithms with quadratic runtime complexity too, but it might still complete with medium sized datasets in an acceptable time frame.

You ideally want linear runtime complexity. If you double the input data size, then it should not take more than double the time. Constant time would be best, but unlikely to be achievable in the domain of data processing (hash indexes have amortized constant time complexity, but you still need to account for I/O bound operations reading/writing data which will scale linear at best). Something in the ballpark of linear or logarithmic complexity is possible for importers in general I would say, but only if there is no operation that takes more time on each call as data gets imported - also see https://blender.stackexchange.com/questions/7358/python-performance-with-blender-operators/7360#7360

Carlo said that the Split Geometry feature is very costly in terms of performance. My guess is that the algorithm to achieve this has bad runtime complexity. It might be possible to trade memory for speed using index data structures to make it operate faster, but there are also problems for which there is no algorithm known to mankind which could speed up the task it carries out down to a reasonable time given the dataset that has to be processed by it. Multi-threading does not help here at all. You need a fast algorithm, and if you want to utilize multiple cores, then the algorithm needs to allow to split the work into reasonable sub-tasks which can be carried out in parallel. You should always benchmark such a multi-threaded solution against a single-thread variant, because multi-threading comes at a cost (a lot of overhead for coordination, i.e. work splitting and result merging) and may turn out to be slower in the end - but this can be heavily influenced by the hardware you run it on, the compiler and build settings used and the actual implementation.

Aside from all that, you should also consider the amount of time it takes to write a basic importer using a naive algorithm in a few hours versus the time it would take to write an importer as good as possible, using all kinds of optimizations, over a period of several months. Is it worth to spend a lot of time to cut down on import time (or pay someone to do it for you) or can you also just sit and wait a bit longer on each import? Maybe buy better hardware to save a few minutes instead of spending thousands of dollars on a professional programmer for the perfect importer? If the actual bottleneck turns out to be the Blender API, then it won’t help much anyway.

I hear people complain about web technologies a lot, that they wouldn’t be as efficient as native code (e.g. JavaScript vs. C++). But the reality is that computers are fast these days, memory isn’t nearly as expensive as it was 10 years ago and it does not make much of a difference in perceived performance even if it’s only half as efficient. It would cost much more than twice the time and money to do everything “native” (and probably ten-fold if you target different computer architectures), and that factor is often ignored.

TL;DR: If an implementation uses a very naive algorithm or has a flawed logic, then it may not even matter if the code is written in Python, C++ or something else. Regarding Python vs. Cython performance and multi-threading, the question should be if it’s even worth to re-implement the existing addons, or if the used algorithms can be improved for much larger gains at lower costs.

4 Likes

Is overhead of using C/C++ is different against using Cython? As far as I know it makes impossible to use source code of add-on and requires compiling step first, the same as with Cython.

In th Python documentation they does not recommend to use C/C++ code without Cython or other third party tools.

Generally, you can get the same performance with Cython that you would get with C. For that you often have to write very C-like code though. I haven’t written any Cython code in a while though, so maybe that has changed a bit.

2 Likes

I know quite a few developers that would disagree with that, specially if you deal with quite a big amount of data, it’s inefficient and slow compared to pure C++ code

I’m pretty sure you can get the same performance with Cython, since it just translates the code to C and then uses a C compiler. Not sure if you can do explicit SIMD stuff, so in that regard one has more optimization opportunities in C/C++. Though, as said, to really get the same performance as C/C++ code, your Cython code has to look very much like C code. So much so, that I’d rather directly write C/C++ code instead, hence my original suggestion. For interfacing between C and Python, Cython might still be a good option depending on the use case.

1 Like

I’m no Cython expert, but I tried to use it without much success in terms of performance in the past, I think the main problem is, correct me if I’m wrong, that you would have to replicate the blender operators in cython to make them perform well, because if you use the blender operators as they are inside cython it will still work as slow, is this so?. On the other hand I am interested if your statements are really true, and later when you are less busy, some tutorial or tips would be appreciated so that people like me interested in the subject could learn the right way to use it to take good advantage of the performance, either with C or Cython. I Typed the types (ints, floats, numpy arrays, etc…) everything I could but I think I was still penalized by the use of blender operators inside my cython. Best regards.

1 Like