Sequencer python based effects proposal

I don’t realy know all reasons that lead to creation of python imbuf API, but python based effects was mentioned as possible use case. This is in line with my interest, so I started to work on this concept.
I am trying to create solution, that will be integrated in blender as much as possible, in order to benefit from implemented functionality(UI, data animation, data inputs).

There has been criticism on this feature due to fact, that python is slow, when processing data. Obviously we want to use python mainly as template language, calling functions of modules that will process images quickly such as SciPy, Pillow, and Blender’s own Py_Imbuf

following video shows possible usefulness and speed(debug build):

I tried to use Pillow also, but that resulted in ticket on their tracker(cannot import name '_imaging' from 'PIL' · Issue #3342 · python-pillow/Pillow · GitHub)…

Python effects concept

C side

Data storage:
Because concept of this effect is very universal, the biggest problem probably is, how to define & store effectdata.

I don’t know if there is, or can be flexible DNA structure, that can hold any amount of data of any type.
Actually effectdata itself almost does this because it can hold number of different types:

if (seq->effectdata) {
	switch (seq->type) {
		case SEQ_TYPE_COLOR:
			writestruct(wd, DATA, SolidColorVars, 1, seq->effectdata);
			break;

		...

Therefore we can have array(or list) of properties along with identifiers without any problems with data storage.
This will require us to
- explicitly specify, which structures can be used with PYFX
- have constructor, getter, setter to access properties
- register prop as RNA prop if that’s possible. Or leave prop handling on python side - all callbacks are on python side by design.

Identification:
If I imagine effect addon to be one file, we can store checksum of that file in effectdata. On python side we can compare that checksum to list of checksums(past versions), so we don’t accidentaly try to handle fx sequence, that doesn’t belong to us. This way collisions can be detected during addon load and treated.

Rendering:
We call function, that is registered for rendering.
- in case the effect is generator, we will pass only effectdata and relative / absolute frame position
- in case the effect is doing processing, we will pass input images, effectdata and relative / absolute frame position

Cache:
Effect may use number of static images(same for multiple frames), that have to be composited before outputing final image. Cache implementation either in Imbuf API, or on python side(addon class providing storage) should be considered. This can improve performance significantly in some cases.

Return data:
After rendering image, other useful data can be returned, to improve UI.
For example lack of existance of multiline text box in blender can be replaced by effect acting as text box itself. Instead of rendering lines, we can render character by character and returning position of each letter.
Editing of text can be handled directly in preview area by modal operator supplied with effect.

Effect registration:
I have no idea how this works. I find embedded python documentation on how to deal with namespaces, well, non existant :slight_smile:
But in principle, we are doing this already in one form or another.

Python side

Each effect is addon, that provides
- rendering function
- panel & UI stuff
- operators
- property handlers

Most details depends on C side implementation

Sci-fi

Users are complaining about compositor implementation in a way, that compositor in other applications a is perfect for creating effect templates. If compositor in blender was built to process 2D images effectively, I would agree and try to create solution using compositor, but afaik, this is not the case.

However, as I said, we want to use python mainly as template language.

I don’t know how much is node editor bound to scene, and how hard it would be to parse python code to represent it as nodes, but it should be possible.
If you look at code for my text box demo I used bunch of offsets, string and other than that just image functions.
I mean I can imagine for loop node, but worst case, parser would say sorry, you can not edit this in node editor…

Best(and easiest) case would probably be start with node editor and save as python code / new addon.

1 Like

Before we get too deep into this though, I still think Python performance is not nearly good enough. There’s some limitations like the global interpreter lock that make multithreading harder. For that reason it seems unlikely to me that we’d accept this functionality in Blender.

The current sequencer performance is already considered too slow, we need to be looking at how to make it faster. Imagine doing 4K, 8K resolution, or 60fps, or stereo, with a more complicated setup. The primary goal of the sequencer has mostly been to provide realtime video editing. Adding custom Python effects pushes it in a different direction and extra code like this makes it harder to improve the core functionality.

Id like to weigh in here on the speed of python effects.
I have written a python based photo editor, and using pillow or opencv, I can display a variety of real-time effects in good framerate for a good resolution image (around 1200x1000), and this is on my 3+ year old laptop…
Depending on the effect, i can see this being very useful and pretty snappy.

I also think it would be a very good thing for expanding the vse’s user base if there were a library of easy premade drop-in effects, which is an addon I would definitely be developing if this feature were in Blender.

It’s fine for many uses cases of course, but to me it sounds relatively slow compared to what is possible. The main issue though is that performance likely wouldn’t be a lot better on a more powerful workstation, due to poor multithreading support in Python.

Thanks for feedback

we want to use python mainly as template language
Majority of time C code will be run.

I can give you “exact” numbers.

Adding custom Python effects pushes it in a different direction and extra code like this makes it harder to improve the core functionality.

I must say, that I value freedom more than safety. Users don’t have to use this. In fact this effect should serve rather rare and unique occasions, when you really wish to have it.

It may happen, that users will like this and python would act as bottleneck. Logical step from that would be use of node system.

It’s more a matter of making decisions in which directions to grow the project, if we add a Python sequence plugin mechanism that inevitably has a maintenance cost and makes refactoring slower.

For the sequencer the #1 thing Ton insists on is making it fast. It’s not for applying many effects, for that there is the compositor or other software. Rather the goal that was set out is to have basic bust fast video editing, and as long as we do not have a good architecture in place we rather not start extending it in other directions.

I agree it should be realtime. Inevitably there is some waiting time to create previews(proxy) but after that no waiting.

Unless we make GPU accelerated imbuf operations, I think, we are on the limit, what can be done on processing side. Maybe more multithreading.
Blender compositor has terrible performance on 2D images.

Anyway, I want to have fast 2D processing, So if there will be project, I can contribute.
Py_Imbuf module exists independently of this however, and one of it’s goals is to refactor imbuf operations.
Pyfx requires little maintenance, so I guess I can develop simplified version and keep it in my own build.
This is good compromise I guess

GPU acceleration is an option, and perhaps the right solution for the future.

We are not nearly on the limit for CPU performance though. Multithreading and SIMD can speed things up a lot. For the compositor the design is problematic with too many virtual function calls and other design decisions that lead to poor memory access patterns.

I think we should be looking more towards what we can implement, not what may be possible. As for speed, for many cases this would be MUCH faster than the current solution (scene strips using nodes), and also much less of a massive pain to deal with logistically.

As a person who uses the vse almost daily, and often use it for very large projects, and who has poured many hours into doing my best to improve the vse with addons, i’m asking you to please reconsider your view on this.

1 Like

I want to refer to
https://developer.blender.org/T54272
mainly

Note that additions to this API would mean some refactoring & improvements to ImBuf since the Python API would be a thin wrapper on ImBuf operations.

I know, that this is not priority, but I want to work on this.
I wanted to move all drawing algorithms from sequencer to imbuf module, replicate most of compositor functions that will be missing, then maybe some useful algorithms, that most image editors use.
This would result in self contained image processing module.

Since this process has not really started yet, good question is, if it makes sense to think about openCL / CUDA support.
I have no idea, how much complexity would this add to code at this point. But I am tempted to try it.
I think openCL is a best choice, as it’s not so platform specific.
Then question is - Would this be OK with blender design / developers?

It could be interesting to look at halide-lang for any VSE/Compositor future, it seems like it’s designed to push around pixels with high performance and will run on anything from cpu’s to opencl, metal ,cuda or opengl even.

I think if someone wants to spend a lot of time, then GPU support would be great. However speaking from experience working on GPU renderers, it is a pain to debug and maintain. Transferring data between the CPU and GPU is slow, so you also need to port nearly all code to the GPU to fully benefit.

OpenCL is great but on the other hand it is deprecated on macOS, and in general its future is uncertain. An abstraction like Halide makes sense. Alternatively GLSL could be a good option, as we also want to have a realtime viewport compositor, the underlying system could be shared with the sequencer.

Personally what I suggest would be to push the CPU code further, investigate what the bottlenecks are now, before going to deep into the complexity of GPU programming. In general we are more likely to accept a new complicated system from developers who also show they want to take responsibility to maintain the module long term, doing bugfixes and reducing technical debt. Otherwise it falls to the core developers to maintain another complex system.

From a user point of view, interaction speed (how fast a user can perform functions) differs based on the work being performed.
When I edit video together I expect to be able to play, scrub and jump around the timeline very quickly.

However when I am creating an effect I don’t expect the same sort of responsiveness.

In this case I want to see changes applied in real-time to a still image. When I play the effected video, I understand that there will be frame dropping/resolution loss or (temporary) rendering required.

So I believe that there are 2 performance issues:

  1. speed of playback for general editing duties

  2. speed of image redraw for compositing/effects

Perhaps multi-threading of drawing could enable background effect rendering or proxy generation to enable a better experience during playback of proposed effect plugins?

From the example video I see that an effect could be constructed from the script, I wonder if there is a way to generate the effect from inside Blender, using it as an elaborate GUI instead of a text editor? Perhaps an Animation Nodes or a modifier stack approach to generating structures from elements?

True, one simple effect may not benefit GPU processing, but bigger chains will.
I still have to explore multithreading more to get idea of code overhead by setting up jobs.
For example gammacross is multithreaded effect. I would like to compare it’s performance to single-threaded job.
Size of a frame may range from ~1MB to 150MB @4K 32bit, so we ca not really say, that this method is universally best.

SIMD implementation seems to introduce negligible code overhead. With AVX this would mean up to 8x faster computation per core, but as I read performance may be limited by CPU cache and memory throughput.

It would be nice to plot some numbers of technology vs datasize vs speed to have clear image of what’s possible.
Actually this may be nice for Open Data benchmark tool:
run few algorithms, say, aplha over, blur, transform
scalar as control, SIMD, SIMD + MT, openCL, CUDA, GL
datasizes of 1, 2, 5, 10, 20, 50M
1, 2, 3, 5, 10 iterations
Or something like that…

Well this is more job for cache and prefetching.