Sequencer cache improvement proposal

Unless user has input that has only I-frames(which is one of characteristic of proxy), sequencer is unusable. You have to have proxies. At least that is situation now and users tolerate this.

We can improve this,

  • proxy is cache, make efficient way to load it to RAM

Will create a mechanism to do it. As you have to load frames in bulk.

As I said, I will collect some numbers, and present possible improvements.

Thanks I will try to consult details of this when I will have something ready.

I think there may be a big gap between casual users who just want to edit together a few simple videos or an image sequence, and more serious users who will use proxies and heavier effects.

My main concern is ensuring both cases still work. If the preprocess cache is not effective for either then it seems fine to remove, but I don’t have much insight into that.

If the preprocess cache is not effective for either then it seems fine to remove, but I don’t have much insight into that.

I’ve edited hundreds of videos with the VSE, and it’s is unusable with HD footage without proxies. Even screencasts with a low bitrate. With the current playback engine, even with 100% proxies playback isn’t 100% smooth, although it’s fine for color grading. Just to second Richard here: you can’t use the VSE without proxies.

But many users do use it without proxies though, not everyone’s use case is the same. If you’re editing long videos with a bunch of cutting then sure, but if you’re just chaining together a few shots or trimming the start/end of a video you may not go through the trouble of setting up proxies.

Even with proxies, maybe the source cache is useful because they have a slow hard disk or network drive, maybe they’re just editing a short video or part of a longer video that fully fits in the cache, maybe it makes updates more interactive if you are tweaking strip settings, etc.

We just have to be careful not to make design decision purely from the point of view of an advanced user.

Isn’t it more prudent here Brecht to target the BI’s tested-in-production as a baseline? If we begin to think about lowest common denominator, I worry we have already lost the struggle.

I understand your care for “both cases”, but in this particular instance, it is a clear line between workable and not depending on which side is chosen to design to.

Can’t design for “everyone”.

The proxy path is essentially an offline / online division, which we certainly can clean up and make a little more workable. A toggle for example, with auto generating of onlines attached with LTC timecode etc.

As Blender is now, with no checks on ex. source ratio, fps or vfr(variable frame rate), and no warnings or suggestions to re-encode to proxies, if incompatibility with current project settings(or change project settings), many first-time casual users will get a bad experience caused by slow playback or audio out of sync(vfr).

So casual users may need help and guidance like warnings, suggestions(to change project ratio, fps or generate proxies of incompatible source-material), or even forced automatic proxy generation(in the case of vfr), for a successful experience of using the VSE. Meaning they would have to walk the same path as more serious users. Of course, these things should be optional, but I see nothing wrong in trying to nudge users into what is considered best-practice with the purpose of giving them the best possible experience.

Certainly a better UI for proxies would help.

We seem to be going off track here. All I suggested is a alternative solution where instead of removing the preprocess cache, we put the data in the main cache at low priority. If there is a simple way to avoid the risk of breaking other use cases, then why not do it?

Some early news:
I looked in preprocess cache, and found, that it is unfinished or what. This means, that it’s not working. Here’s a snippet:

void BKE_sequencer_preprocessed_cache_put(const SeqRenderData *context, Sequence *seq, float cfra, eSeqStripElemIBuf type, ImBuf *ibuf)
{
	...
		if (preprocess_cache->cfra != cfra)
			BKE_sequencer_preprocessed_cache_cleanup();
	...

This cause any new frame to erase the cache. So there are no wasted resources.
To be fair, I didn’t realize, how much memory you need just for one frame. Now I am curious if other similar software use some kind of compression algorithm.

I added frame cost to seq cache. cost = (1/set_FPS)/time_spent_rendering_frame. So if cost < 1, frame renders fast enough.
I implemented cache viewer similar to what’s used in movieclip. Frame cost is shown in color scale - blue is best, red is worst. This reveals some interesting patterns…

After this I tested simplistic prefetcher. It works, I will have to implement freeing of “used” cache frames to be able to prefetch indefinitely.

Here is clip of playback with cacheview(cache for strips is disabled)

Interesting are strong blue strips, that are always on the same spot and evenly distributed. Red presumably indicate overhead from opening new filestream.

My plan is to finish this prefetcher with strategy: prefetch n future frames, if needed free “used” cache with lowest cost. This is good strategy to play back long parts of timeline.

Then I will look at how movie files are loaded and try to optimize this a bit. General idea is, that if some file plays smoothly outside of blender, it should play smoothly in blender. Without proxies of course.

After that we can implement prefetchers with more strategies - such as far lookahead for parts of timeline, that render so slowly, that “play” prefetcher would not be able to render frames fast enough
Editing prefetcher, triggered by user making edits - remember edits, keep sources in cache, possibly start prefetching result

These changes would render main thread mainly as a cache viewer, with prefetch threads maintaining cache depending on chosen strategy.
This process has to be automated, and I think, that we have to recognize workflow patterns.
Resources are scarce - with 16GB of cache you can store only 1 minute of full HD 60fps 8bit footage.

Speaking about cache viewer, which in context of proxies sequencer already is, user should be able(and pushed) to make proxy on some time consuming part, when he is satisfied with edits. then this 1 minute of cache should be enough for a lot of tasks.

In my case I have only 1GB of cache(4GB RAM, don’t laugh) and with proxies on effects I can at least preview timeline in real time, so even low-end users can be satisfied.

3 Likes

And 8 bit isn’t sufficient for colour transforms / composites even for offlines, so you can factor in double the capacity.

It’s easy to say, a bit harder to code. But instead of using compression on cached frames I would feed frames to ffmpeg encoder, store encoded stream in RAM, and when data are not needed dump them to HDD as a proxy.

This would have to be managed through some scheduler, because file loading would have to have absolute priority. Normally you load quite small chunks, but you also have to dump few gigs of data suddenly

But basically the same goes for prefetcher - they should utilize as much CPU resources as possible, but playing thread have absolute priority.

Anyway, my progress is extremely slow, as I have to attend my daytime job, so this is still in quite distant future

1 Like

Are you making any progress? :slight_smile:

Hi, thanks for asking.

I made some progress from last post, not so much as I m occupied by main job currently.

I tested rendering cache frames in scene copy - works OK, but by creating scene copy even outside of running bmain struct messes with UI (scene selector is red, keyframes stop working)

I need some storage for prefetch thread to avoid using globals. Same for cache itself. Scene I guess would be good.

I tested cache limitor - also works OK, but still need to polish that.

I started testing stability without using mutex - this is where I ended. Usage of mutex can block main thread for time of rendering 2 frames instead of one.

I am taking few days off this week so I will try to make some POC in week or 2.

From my tests this makes a huge impact on preview smoothness.

2 Likes

Wow, sounds great. I’ll post about this on the VSE FB Group. :slight_smile:

1 Like

Congratulations on your new patch. Would love to test it, if you can share a build? :slight_smile:

1 Like

Thanks :slight_smile:

I guess I will update this thread also - early patch is here:
https://developer.blender.org/D3934

with win32 build in the comments:
https://developer.blender.org/D3934#89251

2 Likes

@iss have you continuesd with the development of this? :slight_smile:

Yes, I will update patch in a moment, if you want binary, I can build it in few hours when I get home

2 Likes

no rush, just wanted to know if it kept going, ive tried the current versions and for me it crashes and it seems to not let me adjust to more than 2 gb of ram

Sorry for late response, wasn’t notified…
I have updated build, but later I found, that when I build release version it wasn’t working due to compiler optimization…

So I spent quite time in assembly, then almost a week in performance profiler. And finally I enjoyed learning how to introduce DNA structs, and how RNA list iterators work without documentation :slight_smile:
So just a little useful work was done…

At this point I have to finish all functionality.

Anyway, the 2GB limit is probably because you are using 32bit build.
I guess I can build 64bit, but did not try to do so…

4 Likes

ah!, good to know, yes that’s very likely the case… but i mean at this point why even build a 32bit version, i guess all serious players are on a 64bit machine, even so if youre not that serious at all…