Cycles Apple Metal device feedback

MetinSeven · March 20, 2022, 6:26pm

Great, thanks! Every bug report helps to improve Blender.

─ Edit: Ah, I see that the issue has been solved?

Nurb2Kea · March 20, 2022, 6:27pm

just connect the render layers node image output
with
the image input of the denoiser node

johnl2112 · March 20, 2022, 6:33pm

Regarding getting the M1 Ultra GPUs to full power, its not quite as simple as “put in more electricity, get more fast render”

What needs to happen is something along the lines of (and sorry if I am getting the actual terms/order of things wrong here) is to profile the hardware via instrumentation while its running, to see what work is actually being dispatched to the gpus, how its being split up, and what they are spending their time doing. The devs might see that for 80% of the time, the m1 is waiting for some trivial work (lets make something up like, rendering to blender’s internal frame buffer) and the other 20% of the time, its rocking and rolling on the rendering. In that case, better scheduling or using a faster path to the frame buffer or whatever, can remove some bottleneck that would allow the GPUs to be utilized closer to the top of their power envelope. But if thats NOT fixed, you can’t just shove in more voltage and be like “be more fast!”

Or, it could turn out that some feature of the GPUs not relevant to rendering (I dunno, gpu based z buffering?) has a lot of silicon dedicated to it, and thats what can really drive the power draw.

My unscientific guess is that it will land somewhere in between. Some bottleneck will be found in the pipeline that will get fixed, but at the same time, no one is going to go “holy cow, apple secretly shipped blender cycles optimized ASICS we didn’t know about, now its like running 6 3090s at 2 watts!”

What I can say though is on my non-m1, i9 iMac with a Vega 48, that pre 3.1, cycles in the viewport could best be described as “lol” and now its quite useable. And between that an open image denoise, I need way fewer samples to get a good idea of final render quality. I am stoked by that. Thank you apple and blender devs!

(Also to the person thinking intel denoise doesn’t work on M1 because its not “intel” - intel is just the company behind it, and they state it works on modern intel and apple silicon. Though, I don’t know how hard they worked to optimize it with the neural engine vs favoring their own SSE4/AVX* hardware)

jdiamond · March 20, 2022, 6:46pm

I’m so glad you guys spotted that - it turns out there is an issue, but it’s a weird migration issue. I confirmed that my original projects in 2.93/Intel all have the image connected to the denoiser. When I open those blender files in 3.1.0/M1, that link is gone! (Maybe it’s something about the denoiser?). While this is weird, the good news is it’s a simple fix to my projects… Also looks like my animation timelines were corrupted when loading the old blender files. OK, I’ll have to work on this - but at least it’s a different kind of issue. Thanks to all of you for your great support.

Nurb2Kea · March 20, 2022, 6:52pm

There is some stuff not right in your scene, but …
I haven’t had this problem with 3.0 to 3.1 and the connections gone, but if that is so at your end then you should UPDATE your bugreport. I added the solution, so NOW you can add that the connections dissapear when loading a 3.0 file into 3.1.

jdiamond · March 20, 2022, 6:55pm

Thanks - there appear to be some other minor project corruptions as well, like the timeline/animation/camera positions. But I need to study this a little more to characterize it all. The good news is IMO this is a lesser issue than a broken denoiser.

Steve-Hanff · March 20, 2022, 6:55pm

Talking about power, is it normal that the power draw is that low in Blender?

Rendering on the GPU on my 14 core.

64 core form the Mac tech video.

Lets make it even more interesting 32 core Max.

I still find something very weird about this.

Also does not make sense to me that about 4,5 times the cores only uses about 2,7 times more power or twice the amount of cores only uses about 7 W more (based on the average).

Are those correct?

The scene does not have an impact either if it is BMW or Monster under the bed, same average 14 W package power on the 14 core

Any thoughts?

Nurb2Kea · March 20, 2022, 7:02pm

iMac 27" 3.8 GHz 8-Core Intel Core i7,
64GB,
AMD Radeon Pro 5700 XT 16 GB

Blender 3.2

Only GPU RENDER : bmw_27_gpu = Time: [00:48:16] | Mem: 768.55M
CPU something like 3 min. something…

jdiamond · March 20, 2022, 7:02pm

I Updated the bug report. Looks like it’ll just take a little fiddling around to get things functioning again.

Steve-Hanff · March 20, 2022, 7:03pm

From memory that 5700 XT time seems ok, I get about the same in Windows.
Well ok it is only an RX 5700 XT.

ABR · March 20, 2022, 11:35pm

I don’t think he was comparing apples to apples…

My M1 Mac mini with Blender 3.2 renders the BMW scene in 2 mins 5 secs (Blender 3.1alpha did it in 1 min 47 secs).

So MBP 16 M1 Max 32 core should be faster than 3 mins 25 secs.

S.I · March 21, 2022, 1:41am

Hey! Yeah, we’ve been noticing the same issues on our end. Would love an ETA on when to expect a fix.

jdiamond · March 21, 2022, 4:44pm

Hi Friends! I had tapped out my quota of 20 replies/day. I can’t thank you enough for your wonderful and rapid help. To contribute, I wanted to share my performance observations on my 16" M1 Max MBP. Each CPU core is just a little more than twice the performance of a desktop AMD CPU core for raw number crunching - and that’s what you see with highly optimized, regular applications. The enormous speedup numbers come from applications that are branch heavy and memory intensive with a lot of random access. So I’ve found CPU rendering about 8x faster than my Intel laptop. The disappointment is with the GPU - each GPU core is ~3.5x faster than each CPU core. But on GPU Compute, I’m getting parity - about 3x faster rendering with 32 GPU cores than 10 CPU cores. That’s an efficiency of < 30%, so clearly there’s headroom for as much as a 3x improvement in GPU rendering with time. But I personally am thrilled, because before I had to rely on material preview (lookdev) mode, and now I can finally use render preview mode in real time, just like people using expensive desktop rigs! [As a quick aside, I’m using the release 3.0, not 3.1 or 3.2]

kesselfun · March 22, 2022, 1:11am

No one is claiming that Apple Silicon has yet matched an nvidia gpu.

But this is clearly a half step to proper optimization for Metal. To many it seems like blender was touting this as the full version of metal support but it’s clearly not. When it’s still not taking full advantage of the M chips.

People use blender as a benchmark for graphics, for better or worse and are now giving out essentially not true information with only half optimization.

If it’s still in early development then call it such and don’t release it as a release candidate

brecht · March 22, 2022, 1:18am

The release notes say exactly that:

The implementation is in an early state. Performance optimizations and support for Intel GPUs are under development.

https://wiki.blender.org/wiki/Reference/Release_Notes/3.1/Cycles

SkylineX · March 22, 2022, 3:15pm

I know its been a while, but I found something that actually claims out of core rendering on all rendering devices. Here is a quote from the Blender GPU Rendering guide on blender 3.1.

With CUDA, OptiX, HIP and Metal devices, if the GPU memory is full Blender will automatically try to use system memory. This has a performance impact, but will usually still result in a faster render than using CPU rendering.

Alaska · March 22, 2022, 9:48pm

I made that change to the manual. CUDA, OptiX, and HIP have been confirmed to use out-of-core rendering. Metal can do out-of-core rendering on Apple Silicon because of the unified memory layout of these chips. And according to an Apple representative, Metal can do out-of-core rendering on AMD GPUs, but it’s not “ideal” and can be improved.

knox · March 24, 2022, 11:45am

The answers are in the post you replied to:

The structure of the renderer is much aligned with the existing path taken for CPU and CUDA, with little leverage of Apple Silicon’s more unique architecture as of yet. . . .

Optimisation is going to be an ongoing effort, rather than a task we tackle just the once, and I’m hoping the team can see some improvements land in every release. We have big ambitions.

As for your other questions:

What are you talking about? RTX 3090 released in 2020. “Why is it rare?”

There are literally millions of results on Google detailing the problem of the ubiquitous chip shortage.

It is well known that it is hard to find one at MSRP due to the well known chip shortage that has been going on for years, and is expected to continue.

You said “I dont see any reasons of using Apple Silicon over RTX 3090,” and someone provided you some reasons including “people already have Apple hardware for reasons other than Blender rendering speed,” and “3090s are selling at above MSRP.” For that matter, a 3090 can’t even fit into a MacBook Pro, so it isn’t even an option for people with a MBP (and yes, people with a MBP exist).

Just accept that reasons exist whether someone sees them or not, and that other people aren’t all exactly the same but instead have different priorities, which is part of why the Blender community is so great.

The fact that Apple is spending money to improve M1 performance in Blender is absolutely amazing and is a benefit to the entire Blender community. Limiting ourselves to a single video card from a single company, and ignoring the plethora of reasons people enjoy Apple Silicon over AMD/Intel/Nvidia (or vice versa), is the last thing we should want to defend.

knox · March 24, 2022, 11:53am

jason-apple says yes!

There is certainly scope to use the Apple Neural Engine for denoise in the viewport too.

knox · March 24, 2022, 12:53pm

Apple does document ray intersection acceleration, but it seems like it uses a GPU kernel rather than dedicated hardware.

https://developer.apple.com/documentation/metalperformanceshaders/metal_for_accelerating_ray_tracing?language=objc

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Cycles Apple Metal device feedback