Proposal: Bump minimum CPU requirements for Blender

brecht · December 10, 2022, 4:10pm

We don’t collect any information like that. Opendata indeed will be skewed towards people with new CPUs that want to benchmark them, I don’t think it can tell us anything useful.

In terms of what is acceptable to drop, it will depend on the performance benefit but I think it’s somewhere < 1% of users. If v3 for example means 10% of users according to those Steam numbers, that’s much too high even for Blender 4.0.

For Cycles, we can probably drop SSE3 and AVX kernels and only keep SSE2, SSE41 and AVX2. Might be worth trying a AVX512 kernel but not really expecting that to do much.

retro · December 10, 2022, 5:24pm

Windows 7 and 8.1 stop being supported next month. Would there be any benefit in matching just the W10 requirements in the short term?

LazyDodo · December 10, 2022, 5:41pm

I’ve hinted at this previously, but I have yet to see solid convincing performance numbers to justify raising the requirements.

Do note that these numbers should exclude cycles, as it already has specialized kernels for various architectures . If anything we may have to look into adding specialized AMD/Intel kernels there if we were able to replicate the essentially free 2-9% perf bumps by using -mtune that @FelixCLC is hinting at.

until we have a solid gasp of what parts of blender get significantly faster by twiddling the build flags and nothing else, any discussion on what lower bar we should pick is putting the cart before the horse a tiny bit and is ultimately a waste of time for everyone involved.

retro · December 10, 2022, 5:43pm

I see that W10 does not require SSE3, which Core 2 Duo already supports.

ThomasDinges · December 10, 2022, 6:04pm

Please let’s not mix this discussion with Windows requirements. I fully agree with what @LazyDodo said, now it’s time for numbers.

FelixCLC · December 10, 2022, 6:59pm

Per yourself and @LazyDodo, what sort of numbers/data etc. are valuable?

I’m a third party willing to help out, just LMK what project/build system etc. you want to see.

LazyDodo · December 10, 2022, 7:08pm

Bleder overall nothing is easily available, while the unit test coverage is ok…ish we do not track any performance metrics currently, some work has been done in this area by geonodes team, but afaik this has been experimental in nature sofar.

If you specifically given your experience want to tinker with something right now, i’d say cycles is your best bet, you can find the instructions for the benchmarks over here the compiler flags for cycles can be found in intern\cycles\CMakeLists.txt

kurk · December 10, 2022, 7:12pm

Is that estimation for per core?

HooglyBoogly · December 10, 2022, 7:48pm

Generally agreed. Since one mentioned benefit of bumping the requirements is that we would be able to use SSE4 intrinsics in the code, I wanted to point out that there are probably preferable ways to use SIMD that don’t require that. Seems like after figuring out dynamic dispatch once, further use would be easier.

I don’t have high hopes for auto-vectorization, though it might help in some simple loops. That’s nice to get “for free”, but I think we ought to aim a bit higher than that.

FelixCLC · December 11, 2022, 12:50am

Alright, I’ll take a look.

So in terms of wanting data overall, just from people in this thread, what hardware do we have access to?

I can spool up an old ivybridge system for an AVX but no AVX2 example, I can do Bulldozer era APUs, Haswell (first gen AVX2), AlderLake for one of the most recent AVX2 systems, and another specialized AVX512 system

FelixCLC · December 11, 2022, 12:55am

Not sure what you mean with this question.

Typically in full application profiling, seeing an uptake of overall performance in the 2-9% total throughput (say in CFD solvers which scale roughly with SGEMM FLOPS) with per uArch tuning is common.

This assumes that you were compute bound and not say memory bandwidth bound, I/O bound etc.

That’s a deeper rabbit hole and gets very OT, but for the sake of pointing you towards resources if you’re curious: ARXIV has a fair few interesting papers on the topic, or if you prefer talks/videos, look into the divergence of Flops vs IOPS in HPC to get a sense of it.

A practical example of this that consumers are seeing these days are massive amounts of cache in consumer CPUs to help “hide” how bandwidth starved modern CPUs and GPUs are.

It’s also known as “Feeding the beast”. It’s why Intels next gen “Max” series HPC data center CPUs will have on package HBM memory, because HBM can keep up, unlike traditional DRAM (including “Next gen” DDR5, which still isn’t fast enough).

You can see the same sort of approach with AMD and the “V-Cache” they’ve been adding to some of their Epyc and now a single consumer Ryzen SKU.

Time waiting on data to come into the core is time not computing/doing valuable work. Therefore why we have 3 levels of cache on CPUs these days

LazyDodo · December 11, 2022, 1:03am

I’d say test with the hardware you have first, if that leads to results worth taking action on we’ll worry about the other architectures.

retro · December 11, 2022, 2:03am

I think small gains would be more important at the low end.

I could eventually test on Bay Trail, but this is borderline usable as a basic machine.

If you want to test SSE3, I should be able to do it on Phenom X3, but this also gets maxed out on basic tasks.

Other machines not great for benchmarking.

lcas · December 11, 2022, 10:11am

This is the important thing for legacy system users to remember. These blender versions aren’t going anywhere. Also, you aren’t going to be stuck on old hardware forever.

Those who are able to invest their time and effort so that they can afford better hardware should be… I guess “rewarded” is a way of putting it, but isn’t quite like that. Sacrificing even 1% performance, just to keep something from over a decade ago on the officially supported list, doesn’t make sense.

I don’t even want to mention it, but I suppose if somebody is willing to spend their time to include really old hardware in the latest and greatest releases, you could have 2 different blender versions.

retro · December 11, 2022, 10:28am

The question is if the old versions could sill be used because of security and compatibility. I think that some old chips could be better than some not so old ones in terms of security too.

lcas · December 11, 2022, 10:39am

Reading a little more I don’t even know if this is runtime performance, but I compile blender enough that I’ll take faster compile time. Sorry netburst, core2, or whoever this would potentially leave mired on some barbaric and uncivilized FOSS DCC made in the year ~2024.

ThomasDinges · December 11, 2022, 12:44pm

I was here when Windows XP support was dropped, 32bit support… I know these discussions.

I happily repeat myself, Blender has very light system requirements compared to other DCC applications.

Cinema 4D requires AVX since S26 in April 2022.
Houdini requires SSE4.2 since at least 16.5 in 2017.
Maya requires SSE4.2 since version 2017.
3ds Max requires SSE4.2 since version 2019.

Also to make it clear, I started this thread in the hope to agree on new CPU requirements. That doesn’t meen we have to make the switch immediately. But having an agreement to move to something like x86-64-v2 can stimulate development and ideas and once a patch is tangible and can be merged it can just be done without further discussions. I think that would be a fair compromise.

retro · December 11, 2022, 1:26pm

The problem is that SSE4.1 is not a nice cutoff point… Something great like the X6 gets dropped and weaklings get in. But now it is pretty basic and not a cheap upgrade.

It was annoying when some game required 4.2 just for some DRM.

ThomasDinges · December 11, 2022, 3:48pm

One aspect that hasn’t been mentioned before are libraries. Many libraries can be build with SIMD levels for better performance. Some examples:

OpenVDB:

Choose whether to enable SIMD compiler flags or not, options are: None SSE42 AVX.
Although not required, it is strongly recommended to enable SIMD. AVX implies SSE42.

Open Image IO:
OIIO comes with SIMD support as well, their changelog comes with some mentions.

Open Shading Language:

From their changelog:

Many noise() varieties have been sped up significantly on architectures with SIMD vector instructions (such as SSE). Higher speedups when returning vectors rather than scalars; higher speedups when supplying derivatives as well, higher speedups the higher the dimensionality of the domain. We’re seeing 2x-3x improvement for noise(point), depending on the specific variety, 3x-4x speedup for 4D noise varieties. #415 #422 #423 (1.6.1, 1.6.2)

Further improvements are possible (SIMD batched shading mode), but these would require coding changes in Cycles first.

LazyDodo · December 11, 2022, 4:44pm

If you want i could build you a (windows) lib set with sse42 as the target, given this is a good 12+ hours of work at the minimum, I’d like to see some commitments here first, “meh maybe if i have time I’ll take a look” isn’t gonna cut it.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Proposal: Bump minimum CPU requirements for Blender