Cycles Apple Metal device feedback

Great, thanks! Every bug report helps to improve Blender. :+1:

ā”€ Edit: Ah, I see that the issue has been solved? :slightly_smiling_face:

just connect the render layers node image output
with
the image input of the denoiser node

Regarding getting the M1 Ultra GPUs to full power, its not quite as simple as ā€œput in more electricity, get more fast renderā€

What needs to happen is something along the lines of (and sorry if I am getting the actual terms/order of things wrong here) is to profile the hardware via instrumentation while its running, to see what work is actually being dispatched to the gpus, how its being split up, and what they are spending their time doing. The devs might see that for 80% of the time, the m1 is waiting for some trivial work (lets make something up like, rendering to blenderā€™s internal frame buffer) and the other 20% of the time, its rocking and rolling on the rendering. In that case, better scheduling or using a faster path to the frame buffer or whatever, can remove some bottleneck that would allow the GPUs to be utilized closer to the top of their power envelope. But if thats NOT fixed, you canā€™t just shove in more voltage and be like ā€œbe more fast!ā€

Or, it could turn out that some feature of the GPUs not relevant to rendering (I dunno, gpu based z buffering?) has a lot of silicon dedicated to it, and thats what can really drive the power draw.

My unscientific guess is that it will land somewhere in between. Some bottleneck will be found in the pipeline that will get fixed, but at the same time, no one is going to go ā€œholy cow, apple secretly shipped blender cycles optimized ASICS we didnā€™t know about, now its like running 6 3090s at 2 watts!ā€

What I can say though is on my non-m1, i9 iMac with a Vega 48, that pre 3.1, cycles in the viewport could best be described as ā€œlolā€ and now its quite useable. And between that an open image denoise, I need way fewer samples to get a good idea of final render quality. I am stoked by that. Thank you apple and blender devs!

(Also to the person thinking intel denoise doesnā€™t work on M1 because its not ā€œintelā€ - intel is just the company behind it, and they state it works on modern intel and apple silicon. Though, I donā€™t know how hard they worked to optimize it with the neural engine vs favoring their own SSE4/AVX* hardware)

3 Likes

Iā€™m so glad you guys spotted that - it turns out there is an issue, but itā€™s a weird migration issue. I confirmed that my original projects in 2.93/Intel all have the image connected to the denoiser. When I open those blender files in 3.1.0/M1, that link is gone! (Maybe itā€™s something about the denoiser?). While this is weird, the good news is itā€™s a simple fix to my projectsā€¦ Also looks like my animation timelines were corrupted when loading the old blender files. OK, Iā€™ll have to work on this - but at least itā€™s a different kind of issue. Thanks to all of you for your great support. :slight_smile:

1 Like

There is some stuff not right in your scene, but ā€¦
I havenā€™t had this problem with 3.0 to 3.1 and the connections gone, but if that is so at your end then you should UPDATE your bugreport. I added the solution, so NOW you can add that the connections dissapear when loading a 3.0 file into 3.1.

Thanks - there appear to be some other minor project corruptions as well, like the timeline/animation/camera positions. But I need to study this a little more to characterize it all. The good news is IMO this is a lesser issue than a broken denoiser. :slight_smile:

Talking about power, is it normal that the power draw is that low in Blender?

Rendering on the GPU on my 14 core.

64 core form the Mac tech video.

Lets make it even more interesting 32 core Max.

I still find something very weird about this.

Also does not make sense to me that about 4,5 times the cores only uses about 2,7 times more power or twice the amount of cores only uses about 7 W more (based on the average).

Are those correct?

The scene does not have an impact either if it is BMW or Monster under the bed, same average 14 W package power on the 14 core

Any thoughts?

2 Likes

iMac 27" 3.8 GHz 8-Core Intel Core i7,
64GB,
AMD Radeon Pro 5700 XT 16 GB

  • Blender 3.2

Only GPU RENDER : bmw_27_gpu = Time: [00:48:16] | Mem: 768.55M :ok_hand: :+1:
CPU something like 3 min. somethingā€¦

1 Like

I Updated the bug report. Looks like itā€™ll just take a little fiddling around to get things functioning again.

2 Likes

From memory that 5700 XT time seems ok, I get about the same in Windows.
Well ok it is only an RX 5700 XT.

2 Likes

I donā€™t think he was comparing apples to applesā€¦

My M1 Mac mini with Blender 3.2 renders the BMW scene in 2 mins 5 secs (Blender 3.1alpha did it in 1 min 47 secs).

So MBP 16 M1 Max 32 core should be faster than 3 mins 25 secs.

Hey! Yeah, weā€™ve been noticing the same issues on our end. Would love an ETA on when to expect a fix.

1 Like

Hi Friends! I had tapped out my quota of 20 replies/day. I canā€™t thank you enough for your wonderful and rapid help. To contribute, I wanted to share my performance observations on my 16" M1 Max MBP. Each CPU core is just a little more than twice the performance of a desktop AMD CPU core for raw number crunching - and thatā€™s what you see with highly optimized, regular applications. The enormous speedup numbers come from applications that are branch heavy and memory intensive with a lot of random access. So Iā€™ve found CPU rendering about 8x faster than my Intel laptop. The disappointment is with the GPU - each GPU core is ~3.5x faster than each CPU core. But on GPU Compute, Iā€™m getting parity - about 3x faster rendering with 32 GPU cores than 10 CPU cores. Thatā€™s an efficiency of < 30%, so clearly thereā€™s headroom for as much as a 3x improvement in GPU rendering with time. But I personally am thrilled, because before I had to rely on material preview (lookdev) mode, and now I can finally use render preview mode in real time, just like people using expensive desktop rigs! :slight_smile: [As a quick aside, Iā€™m using the release 3.0, not 3.1 or 3.2]

2 Likes

No one is claiming that Apple Silicon has yet matched an nvidia gpu.

But this is clearly a half step to proper optimization for Metal. To many it seems like blender was touting this as the full version of metal support but itā€™s clearly not. When itā€™s still not taking full advantage of the M chips.

People use blender as a benchmark for graphics, for better or worse and are now giving out essentially not true information with only half optimization.

If itā€™s still in early development then call it such and donā€™t release it as a release candidate

1 Like

The release notes say exactly that:

The implementation is in an early state. Performance optimizations and support for Intel GPUs are under development.

https://wiki.blender.org/wiki/Reference/Release_Notes/3.1/Cycles

21 Likes

I know its been a while, but I found something that actually claims out of core rendering on all rendering devices. Here is a quote from the Blender GPU Rendering guide on blender 3.1.

With CUDA, OptiX, HIP and Metal devices, if the GPU memory is full Blender will automatically try to use system memory. This has a performance impact, but will usually still result in a faster render than using CPU rendering.

I made that change to the manual. CUDA, OptiX, and HIP have been confirmed to use out-of-core rendering. Metal can do out-of-core rendering on Apple Silicon because of the unified memory layout of these chips. And according to an Apple representative, Metal can do out-of-core rendering on AMD GPUs, but itā€™s not ā€œidealā€ and can be improved.

2 Likes

The answers are in the post you replied to:

  1. The structure of the renderer is much aligned with the existing path taken for CPU and CUDA, with little leverage of Apple Siliconā€™s more unique architecture as of yet. . . .
  1. Optimisation is going to be an ongoing effort, rather than a task we tackle just the once, and Iā€™m hoping the team can see some improvements land in every release. We have big ambitions.

As for your other questions:

What are you talking about? RTX 3090 released in 2020. ā€œWhy is it rare?ā€

There are literally millions of results on Google detailing the problem of the ubiquitous chip shortage.

It is well known that it is hard to find one at MSRP due to the well known chip shortage that has been going on for years, and is expected to continue.

You said ā€œI dont see any reasons of using Apple Silicon over RTX 3090,ā€ and someone provided you some reasons including ā€œpeople already have Apple hardware for reasons other than Blender rendering speed,ā€ and ā€œ3090s are selling at above MSRP.ā€ For that matter, a 3090 canā€™t even fit into a MacBook Pro, so it isnā€™t even an option for people with a MBP (and yes, people with a MBP exist).

Just accept that reasons exist whether someone sees them or not, and that other people arenā€™t all exactly the same but instead have different priorities, which is part of why the Blender community is so great.

The fact that Apple is spending money to improve M1 performance in Blender is absolutely amazing and is a benefit to the entire Blender community. Limiting ourselves to a single video card from a single company, and ignoring the plethora of reasons people enjoy Apple Silicon over AMD/Intel/Nvidia (or vice versa), is the last thing we should want to defend.

1 Like

jason-apple says yes!

There is certainly scope to use the Apple Neural Engine for denoise in the viewport too.

4 Likes

Apple does document ray intersection acceleration, but it seems like it uses a GPU kernel rather than dedicated hardware.

https://developer.apple.com/documentation/metalperformanceshaders/metal_for_accelerating_ray_tracing?language=objc

2 Likes