Cycles Apple Metal device feedback

That is a really interesting side effect.

Whoooooa!!!
I just tested Blender 3.1.0 - Alpha - 5de109cc2d22 on my Mac M1 Mini, and I’m seeing double the performance over 2.93!
I rendered a WIP scene of mine and in 2.93 it took 9:09min to render at 250 samples,
and it took 3.1 4:04min to render the same scene. I am seeing a difference in the output, so I’m going to test a regular blender demo scene and see how it goes.
The viewport seems much more responsive as well!
Thanks so much team!!!

Here are the renders side by side (2.93 / 3.1):

4 Likes

Alrighty, so I tested one of the blender demo scenes, TUGBOAT baby!
Aaannnddd, boom! Excellent improvement again!
2.93 @ 200 samples: 12:55min
3.1 @ 200 samples: 5:01min
Thats a 258% speed improvement!!!

Here are the two renders side by side ( 2.93 / 3.1), not as big a difference between the two vs my WIP render. (I’d upload both images separately so you could toggle them to see the difference, but my privileges as a newb make it so that I can only upload one image at a time).

2 Likes

The latest build (Dec-17th) is crashing for me with the BMW file.

This model doesn’t render correctly with GPU rendering in the December 16th snapshot build:

Edit: Now that I’ve updated to December 17th build, it crashes when rendering starts. Here’s a crash log, but it appears a null pointer is being passed to Metal’s compiler, somehow.

I have a 2021 16" MacBook Pro with a 10-core M1 Pro, 16-core GPU, 16 GB RAM and 1 TB SSD, currently running macOS 12.1.
I was testing the Blender 3.1 Apple Silicon build from 16. December. I’ve also tested the build from 17. December, but that one immediately crashes on any scene when rendering with the GPU.

I have noticed some strange behavior about the refreshing of the blender render window, specifically the status text line (Frame/Last/Time/Remaining/…) and the rendered image:

  • When rendering only on the CPU, the status text line and rendered image is updated for each sample.
  • When rendering only on the GPU, both the rendered image and the status text line are rarely updated. It shows sample 1, a minute later the view updates and shows sample 7, another minute later sample 18, for example. For a different scene, if would show sample 1, then minutes later sample 78.
  • When rendering on both CPU and GPU, the status text line is updated on every sample, but the rendered image is only updated on every odd sample.
    My expectation was that the status text line is updated at least once per second and the rendered image each time a new sample has been computed.

Here are some performance numbers for some scenes. Unless specified otherwise, I’ve left the settings at default for comparability.

Some things I’ve noticed:

  • The GPU utilization as shown by the macOS activity monitor is only around 20% to 40%, while the CPU utilization gets close to 100%
  • CPU+GPU is usually fastest, except for the Blender 3.0 Splash Screen where it is 1.5% slower than GPU only
  • The indicated Mem/Peak values are significantly higher for GPU compared to CPU
  • CPU+GPU requires more memory initially, thus the Peak value is higher than the Mem value. This is not the case for CPU or GPU only.
  • The indicated Mem/Peak values for CPU+GPU are most likely wrong, because the memory utilization as shown by activity monitor is significantly more, similar to the GPU only case
Click to expand benchmark results

Blender Splash Screen 3.0 - Sprite Fright

  • CPU: Time: 02:01:17, Mem/Peak: 2550 MB
  • GPU: Time: 01:00:03, Mem/Peak: 4900 MB
  • CPU+GPU: Time: 01:01:34, Mem: 1640 MB, Peak: 1950 MB

Blender Splash Screen 2.93 LTS - Still Life (tiling disabled, OpenImageDenoise enabled)

  • CPU: Time: 00:09:03, Mem/Peak: 630 MB
  • GPU: Time: 00:07:34, Mem/Peak: 2010 MB
  • CPU+GPU: Time: 00:04:59, Mem: 115 MB, Peak: 595 MB

Blender Splash Screen 2.83 - PartyTug 6:00AM (switched to Cycles)

  • CPU: Time: 00:06:25, Mem/Peak: 773 MB
  • GPU: Time: 00:03:45, Mem/Peak: 2420 MB
  • CPU+GPU: Time: 00:02:51, Mem: 568 MB, Peak: 671 MB

Monster Under The Bed

  • CPU: Time: 00:10:36, Mem/Peak: 668 MB
  • GPU: Time: 00:06:01, Mem/Peak: 2090 MB
  • CPU+GPU: Time: 00:04:33, Mem: 543 MB, Peak: 563 MB

Lone Monk

  • CPU: Time: 01:09:08, Mem/Peak: 659 MB
  • GPU: Time: 00:29:09, Mem/Peak: 1990 MB
  • CPU+GPU: Time: 00:21:19, Mem: 424 MB, Peak: 548 MB

Car Demo

  • CPU: Time: 00:03:16, Mem/Peak: 140 MB
  • GPU: Time: 00:01:19, Mem/Peak: 1390 MB
  • CPU+GPU: Time: 00:01:06, Mem: 105 MB, Peak: 117 MB

Classroom

  • CPU: Time: 00:07:20, Mem/Peak: 630 MB
  • GPU: Time: 00:03:24, Mem/Peak: 2100 MB
  • CPU+GPU: Time: 00:02:44, Mem: 273 MB, Peak: 605 MB
1 Like

Scheduling of samples happens different between multi-core CPUs and GPUs. As such you will observe phenomena like this. This “issue” with updates being kind of random with the GPU also exists when using Nvidia and AMD graphics cards on Windows and Linux systems. It is simply a bi-product of Cycles trying to schedule work onto the GPU and making the most of the “cores” the GPU offers to give you the best performance.

The latest build 17th of December crashes in every render I try to do.
Reverted back to the 16th build

I’ve upgraded to Monterey 12.1, have the 3.1 Alpha build from 02:12 today, every time I pick any Metal combination and try to Render an image, Blender crashes, as it does for Leonard. I’m running a 8GB M1 Mini.
Running yesterday’s build works. Render times for my “Double Donut” test with particles and geometry are below.
CPU only - 2 mins
GPU - 1.5 mins
CPU & GPU - 1 min
That’s a nice speed-up. Thanks everyone!

1 Like

The crash on macOS has been fixed and the latest build should work fine again.

5 Likes

Thank you for getting Metal going on the M1 Max, way to go!

@fdessen, interesting scores, pretty close to a 32 core m1 max!
M1 MAX 32Gb with 32 cores - set to high power mode

  • CPU: 3:10
  • GPU: 0:42
    didn’t bother with both

Been using the Blender 3.1 Alpha on my MacBook Pro 14 10/16 16GB RAM with MacOS 12.1 and noticed that when I go to F12 render, the RAM usage on Activity Monitor spikes up very high for a second (yellow/red memory pressure), before going down to green for the rest of the render. Is there some sort optimisation that will fix this issue? The split second spike caused my Mac to write to swap, but shouldn’t be the case as the rest of the entire render is working absolutely fine, and while rendering the peak memory value is only 800MB.

Also I’m noticing a large difference in memory consumption between GPU render and GPU+CPU render, but I feel that theoretically there shouldn’t be a difference since in either case the memory will live on the same unified memory pool. For example, a project I’m working on at the moment uses 800MB on the GPU+CPU render but uses 2650MB just by switching to GPU only.

Screenshot 2021-12-17 at 22.56.38

Brings me close to tears of joy :wink:

Thanks everyone for making this a reality. Version 3.1.0 2021-12-16 is already a solid performer for me (MBA M1).

Some observations are raising questions though.

With tile sizes smaller than the output image there is only one tile at a time rendering (as opposed to as many as there are threads), also rendering times become longer. Take the BMW scene:
Tile size 2160 > 1 min 03
Tile size 512 > 1 min 54
Tile size 64 > 2 min 40

Also, CPU and GPU cores are at 90 or so percent, but never full throttle.

While I like the visual patterns that tile renderers make, low rendering times are still preferable :slight_smile:

How should I interpret the above results?

1 Like

Slightly off topic but what’s the best way to keep this Mac build up to date. As far as I can see there’s no Blender Launcher for Mac?

Can you share the file? Would like to benchmark it against mine!

The meaning of tiles has changed in 3.0. See Reference/Release Notes/3.0/Cycles - Blender Developer Wiki for a further explanation.

3 Likes

Good to know, thanks.

Thanks, learned about that, and made a minor correction in the text. :smiley:

Just wondering, is the performance that we’re seeing in this Alpha right now going to improve in the coming versions, as the optimalization furthers?

I’m contemplating on buying the 32 core M1 Max, and this is a deciding factor to me.

I guess so as performance tuning is still on the to do list, how much is the big question.

1 Like