Cycles Apple Metal device feedback

I’m unable to reproduce something like this on a Macbook Air M1. Blender takes about ~500MB memory on startup (and less “real memory”). That’s basically the size of the Blender executable that is loaded into virtual memory, and more of that executable gets loaded into real memory as it gets used. And then rendering adds more memory proportional to the render resolution, textures, etc.

Maybe it is that memory leak bug in macOS, maybe an add-on, or some other unknown factor.

Thank you! Is there currently (and if not, will there soon be) a real performance advantage for the lucky souls with the higher end Apple Silicon, such as the M1 Max, vs. the earlier M1 machines (like a MacBook Air)?

You’ll not find a more hardcore macOS fanatic than me, yet I bought a Win10 machine with a 3080 GPU so I could get through a couple critical projects this year. While it renders well, I hate that machine. If I can get even half that performance on a macOS machine in 2022, I’ll never touch it again. So grateful for this work.

2 Likes

Just did a quick test with a M1 MAX 64Go with 24 gpu cores.

BMW demo scene

  • CPU: 3:21
  • GPU: 0:48
  • CPU+GPU: 0:48
4 Likes

That seems quite good. Getting close to CUDA speed on a RTX 3080 (something like 20-25 seconds), but not as fast as Optix (more like 12 seconds), which considering the years of intensive development poured into that and that this is in “an early state” - oh and this isn’t even a dedicated GPU… amazing.

Is that the BMW demo scene rendered in the defaults from https://opendata.blender.org - so the results can be directly compared?

First off — awesome work. Very happy to see this!

While experimenting I produced a reliable crash in the latest alpha (Dec 15). I seem to have a file that always crashes when trying to render with Metal rendering (M1 GPU) while also using cycles in the viewport:

https://enigmeta.s3.amazonaws.com/bugs/2021-12-15-human-no-texture.blend

Here’s the crash report:

https://enigmeta.s3.amazonaws.com/bugs/2021-12-15-blender-crash-report.txt

the 32 core variant of the m1 max handles the bmw scene in 0:43 (using the default gpu testfile settings), which gets even a hair closer to Cuda speeds. it would be nicer if the jump from 24 to 32 cores was more of a linear performance boost, but as you stated, these are early days :slight_smile:

3 Likes

Is it GPU only time or GPU+CPU?

On a 16" MBP M1 Max with 32 MB RAM the BMW Benchmark file renders in between 41 and 43 secs. with the latest v3.1 build (as of Dec-15-2021). Interestingly, after about 10 renders trying both combinations, I see no significant difference between GPU only or GPU+CPU, although the experience watching it is a bit different. The same file renders in 11 secs. on a PC with an NVidia 2080 using the latest Windows build of v3.1 and Optix. So, basically, the MBP M1 Max is 4 times slower than the PC with the 2080.

2 Likes

Thank you for this reference point. It gives us an order of magnitude to set expectations. I think being only 2-4x slower than a modern NVIDIA GPU is quite impressive for this state of both the hardware and software. It suggests that when Apple’s pro desktop hardware moves to their own silicon, and as Blender continues to grow (hopefully Apple is here for the long term in the Foundation), we may soon see a day where a Mac is comparable to a current generation NVIDIA, except with a lot less thermal and power needs, and no need to buy a dedicated GPU. I’ve often said if I could get my Mac to run Blender at half the speed of my NVIDIA 3080 Windows 10 box, I’d never turn the Windows box on again. (I can make up for that gap with increased happiness and efficiency in other areas)

5 Likes

Please report bugs to the tracker: https://developer.blender.org/ or just use Help → Report a bug from within Blender.

2 Likes

Just did that, thanks!
https://developer.blender.org/T94142

2 Likes

bmw27_gpu.blend, M1 Pro 10/14 CPU+GPU render 1:11

1 Like

16’’ MBP M1 Max with 10‑Core CPU, 32‑Core GPU and 32GB and Blender 3.1.0 (Dec-16-2021)

For bmw27_gpu.blend with GPU the render time is between 0:42 and 0:43.
For bmw27_gpu.blend with GPU+CPU the render time is between 0:41 and 0:43.
For classroom.blend with GPU: 1:47 - 1:48.
For classroom.blend with GPU+CPU: 1:44 - 1:49.
For pavillon_barcelone_v1.2.blend with GPU: 3:11-3:14
For pavillon_barcelone_v1.2.blend with GPU+CPU: 2:40 - 2:44

2 Likes

Hey, I’m excited to test some scenes with my “slower” 8/8 Core MBP, too:
BMW - GPU + CPU: 1:45
BMW - GPU: 2:12
Test done with todays Blender 3.1 build (on battery mode if that makes a difference)

1 Like

I used the M1 mbp test.
BMW-GPU: 3:09
Battery mode Low battery mode

That is great considering the 32 core needs 43 sec, so it is not even twice as fast as the 14 core.

Good new as I returned my 16” and I am getting the binned 14” :slight_smile:

Another Benchmark with todays 3.1 Version in 8/8 Core MBP:
Monster Under Bed - GPU + CPU: 7:44min
Monster Under Bed - GPU: 10:40min

@devs
Is there a reasonable explanation on why the speed gain is not linear compared to different chips.

F.e. the BMW scene with the Benchmarks on this thread
8/8 Core - 1:45min
10/14 Core - 1:11min
10/32 Core - 0:46min

Will there be some blender-side improvements in multicore usage or is this an assumable final picture of speed gain between the m1 chips?

1 Like

I’m sure someone more knowledgeable can weigh in here, but those results seem to support my hunch that for shorter renders (<3 min), the benefits of the cpu tag teaming with the gpu is likely outweighed by the render waiting for the cpu to finish the last of its work. the longer the render runs, the more useful chunks the cpu can finish before holding things up.

would also explain why older M1 chips with fewer gpu cores see more benefit from cpu+gpu. the performance difference between them is likely smaller, minimizing the time where the gpu finishes its work and is stuck waiting on a cpu.

I know its not a tile system anymore so I’m not even sure how the work is divided between the two, samples maybe? anyway I’d be curious to know if there’s anything currently implemented allowing the a device to steal chunks of work back rather than sit idle? (or maybe I’m completely wrong and this isn’t even the bottleneck at all with cpu+gpu rendering)

1 Like

Honestly, I don’t see any bottlenecks with CPU on my “old” 8/8 Core.
The shorter renderings unter 3mins seem to benefit equally as the longer ones from the CPU.

As f.e. on my desktop RTX the render time increases in quick renderings by adding my cpu (x5900) but for longer renders theres a clear advantage. Now with cycles x even more than before.

M1 Max 64GB
BMW SCENE

BMW GPU+CPU:
HIGH POWER: 41.64 seconds
AUTO POWER: 41.33 seconds
LOW POWER: 50.33 seconds

BMW GPU ONLY:
HIGH POWER: 42.89 seconds
AUTO POWER: 43.20 seconds
LOW POWER: 42.66 seconds

I repeated the renders multiple times these were the fastest but the times didn’t vary much per setting per render

The only strange note here is that low power GPU+CPU was slowest repeatedly, perhaps because low power would take a penalty from trying to run something that is technically higher power draw by nature?