Cycles Apple Metal device feedback

AlfredENeuman · February 19, 2022, 4:30pm

So you are saying something kicked the “L” out of your render?

Pixla · February 19, 2022, 6:36pm

So great to see continuous progress for Macs!

Classroom on a Mac Pro 2019, 12c and single Pro Vega II: 1.27:86

Oh, and that RT performance just moving around the viewport. Nice.

Feels good to see life expectancy extend for this computer—being able to beef up the GPU side when used components become widely available.

ADDITIONAL THOUGHT: any chance of the Open Data benchmark to get a little updated buildbot love? Or does that get updated automatically too? I definitely feel that Blender should keep that alive and keep it current—toe to toe with the latest technologies. Thanks!

SkylineX · February 19, 2022, 7:00pm

GPU: Radeon 5500XT (external, 4GB)
BMW:
GPU: 1:03
Classroom:
GPU: 2:48
Junk shop:
crash, out of memory
All on default settings, and using GPU+CPU would actually be slower (most likely my CPU is underpowered).
However, I do feel like my GPU was being bottlenecked as on Activity Monitor, the GPU usage was not that high. I think it might be a RAM issue…

Alaska · February 19, 2022, 10:41pm

The Open Data Benchmark is incompatible with Blender 3.0 and above as some changes in Cycles-X “broke it”. Sergey is currently working on updating the benchmark for Blender 3.0 and above and investigation changing how the benchmarks are performed to make them quicker to run on most hardware.
I do not have an exact date on when the updated benchmark will be released, but I suspect it will be after Blender 3.1 releases.

Alaska · February 19, 2022, 10:49pm

Cycles-X currently has sub-optimal work distribution when rendering with multiple devices of different speeds (E.G. CPU + GPU). This is most likely the issue you’re experiencing and work is underway to improve/fix it.

FaintBleu · February 20, 2022, 10:14am

Here’s my AMD Metal results.
MacBook Pro (late 2016)
Software: macOS 12.3 beta 3, Blender 3.1.0 (2022-02-20)
Hardware: Core i7-6920HQ, 16GB LPDDR3 2133, AMD Radeon Pro 460 4GB GDDR5

bmw27
Metal + GPU: 5:11 (311 seconds, Mem: 768.53M, Peak: 768.53M)

classroom
CPU: 16:55 (1015 seconds, tested on 3.0.1)
Metal + GPU: 10:32 (632 seconds, Mem: 1367.73M, Peak: 1367.73M)
Metal + GPU + MetalRT: Crash

Monster Under The Bed
Metal + GPU: 16:07 (967 seconds, Mem: 1331.43M, Peak: 1331:43M)

Big thanks to Michael, brecht and everyone who worked on this project. I’m just a hobbyist who’s still watching Blender for Beginner tutorials on Youtube, Metal support finally allows me to explore Blender without the fan screaming all the time due to CPU rendering (while maintains 100 celsius), now GPU barely activates the fan.

Great job folks!

Nurb2Kea · February 20, 2022, 1:03pm

Yes, it’s only a matter of emoji’s until the final macOS12.3 release…

SkylineX · February 20, 2022, 7:14pm

Suppose I was rendering a scene on GPU only, but didn’t have enough RAM (4GB). If I rendered with GPU+CPU, would that give me a total of 12 GB of RAM for rendering? I know that it wouldn’t be a complete 12, since other programs use RAM, but I would be able to use my system memory as well as GPU memory, correct?
I’ve 8 GB LPDDR3 RAM in my computer.

Kore · February 20, 2022, 8:34pm

Hey! I’m using the same MacBook Pro as you are and just found your replies. Quick question, any chance there’s been any improvement on this for us using GPU? Could you possibly recommend something as I’m new to Blender and the fact Cycles can’t use GPU is kind of a bummer, is Radeon Pro Render a good choice to use instead of Cycles? Sorry to hit you with 21 questions, it’s just a bit hard to find info on our Mac models specifically.

Alaska · February 20, 2022, 11:05pm

Hi @Kore, I just thought I would let you know that the recent development of Cycles now allows you to use the GPU for rendering in Cycles on macOS if you have either an Apple GPU or an AMD GPU.
To use GPU rendering on Apple GPU you need macOS version 12.2 or newer.
To use GPU rendering on AMD GPUs you need macOS version 12.3 or newer (macOS 12.3 is currently in beta).

Alaska · February 20, 2022, 11:44pm

I may be wrong on this so don’t take what I say in this comment as a fact, but Metal in Cycles in it’s current form doesn’t appear to support “Out of Core” rendering on non-Apple Silicon. As such, CPU+GPU rendering will probably still be limited by the 4GB of VRAM on your GPU?

@Michael-Jones-Apple or @jason-apple can you confirm or deny this?

Edit: It seems I was looking in the wrong place and my assumption was based on incorrect information. Please wait for a proper answer from someone more informed.

Nurb2Kea · February 21, 2022, 4:15am

That would be ‘the biggest bankruptcy’ the community can think of!
Owning a 64GB + 16 GB graphics brand new intel imac, especially bought for that reason.
No one mentioned this ‘maybe’ limitation beforehand from blender side.
Also wouldn’t make any sense why.

So I hope you’re wrong. (@Michael-Jones-Apple & @jason-apple)
Otherwise I give my memory back to apple, to get a second card…

SkylineX · February 21, 2022, 4:24am

Well here’s the thing
My laptop is a MacBook Air (2019) and if I am correct, the CPU is an SoC which is what the Apple M1 is. That would mean that the RAM and Processor cores are on a single chip. I’ll test it out and edit this post with results.
As for you, @Nurb2Kea, the iMac RAM is replaceable which means it is NOT on the same chip as the CPU cores are.

Nurb2Kea · February 21, 2022, 4:27am

Well that doesn’t mean it’s impossible since other render engines have this ability.
Anyways, we’ll see what they come up with, and as usual it get’s improved by time and will of Apple and blender.
My guess is when using CPU+GPU you’ll need memory on both ends. Like the CPU render at the moment needs a lot of memory. I’m not sure, but CPU rendering with GPU-memory sounds really odd !

Pixla · February 21, 2022, 8:40am

I’m using up to date Blender and MacOS on Mac Pro 2019 12 core, Vega Pro II 32GB. Looking for technical explanation of Classroom scene observations:

GPU: 547 seconds. (Experimental with MetalRT on)
GPU: 87 seconds. (Supported, MetalRT off)
CPU+GPU: 87 seconds

Windows PC with RTX 3080 on Optix: 16 seconds.

I don’t believe in magic when it comes to computers. Is Optix cutting corners, or is there so much work left to do on MacOS Metal?

Claus · February 21, 2022, 9:00am

Odd loaded question. RTX cards have RT cores, dedicated to raytracing. Vega Pro II doesn’t. The 3080 on desktop is also a way faster card in every metric.

Pixla · February 21, 2022, 9:35am

Nothing odd (that I can see) or loaded about it. I am simply after the technical reason, which you also, at least in part, provided.

The performance difference is jarring and I am a curious person. I haven’t spend time on Windows or with Nvidia cards for a long time.

I know from my background dealing with render engines and the internet, that often people will say that Engine X is “slow” and Engine Y is “fast”, while happily comparing output from vastly different rendering strategies (actual tracing vs averaging/blurring).

In this case I am assuming common software (Blender) and common engine (Cycles) should perform the same. But since the times are so different and a 3080 is “nothing special”, I want to know the reason so that I can make an educated guess as to if the gap will be more or less bridged in time, or not.

S.I · February 21, 2022, 10:33am

Hey everyone! Can someone run a little benchmark for me with the latest build for Apple Silicon? I really need the numbers for BMW27 on the 16" MBP w/ M1 Max. If anyone can run that for me, would be much appreciated!

Pixla · February 21, 2022, 11:02am

16" MBP M1 Max 32. Blender 3.2 alpha, BMW fresh from Blender’s demo files.

CPU+GPU, RT experimental: 54 seconds
CPU+GPU, supported: 53 seconds

GPU, supported: 49 seconds
GPU, RT experimental: 41 seconds

Alaska · February 21, 2022, 11:13am

I thought I might provide some insight into why a RTX 3080 might be significantly faster than the Radeon Pro Vega II.

The RTX 3080 has significantly higher theoretical compute performance than the Radeon Pro Vega II. This means the RTX 3080 is likely to be faster than the Radeon Pro Vega II when running the same task.
The RTX cards has parts of it’s silicon dedicated to something known as an “RT Core”, a piece of hardware which is used to speed up the task of tracing rays. The Radeon Pro Vega II does not have this. In the classroom scene, the “RT cores” in the 3080 can trace rays roughly 6 times as fast as tracing rays via the “general compute” method (Keep in mind that the tracing of the rays only makes up part of the rendering process. So renderings with RT cores won’t be a 6th of the time when compared to the non-RT core renders. The tracing of the rays in the rendering process will be a 6th of the time it takes to trace rays via the “general compute” method). Note: The “RT Cores” can only be used when rendering with OptiX. CUDA will use the “general compute” method of tracing rays.
The OptiX implementation in Blender appears to have seen more optimization than the Metal implementation. I believe it is possible Metal could see a few percent improvement with some further optimization. But this part is just speculation.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Cycles Apple Metal device feedback