After some fiddling I finally got OpenCL working on my new MacBook Air with the M1. Basically it was only a matter of finding the right settings with some trial and error, the current blender code is already up to the task.
I did not test very much yet, but the famous BMW renders in about 3:40 minutes (GPU and CPU combined).
As I totally understand the reasons behind the deactivation of OpenCL on macOS in blender, my question is:
Is it possible to add some small additions to the deactivated macOS-parts or even turn it on for the M1 by default as long as there is no official support yet? Maybe it is useful for other users too.
We won’t be re-enabling OpenCL support on macOS or doing any improvements to the code for it. It can work in simpler scenes but there are problems in others.
I understand and can confirm, that not all demo files work.
The nasty part of these bugs is, that the program does not gracefully fail, but in the background Apples OpenCL/Metal-wrapper begins to slowly grow in size (>25 GB!!) and blender becomes frozen and must be terminated with force.
To my limited understanding Apple does an on-the-fly translation to Metal with MTLCompilerService without a real OpenCL-driver and this does not work reliably for now.
Anyway, I was pleasantly surprised how well blender works on the M1 in limited scenarios and will do further tests. Basically I only had to switch off NANOVDB and turn on the OpenCL-driver again, the rest would have been small UI adaptions.
If I find something interesting, I would like to post it here.
The ugly memory leak only occurs, when any of the (3) volume materials is connected to the volume input of the material output node.
Every surface shader does not freeze and only hair material shows wrong results.
For denoising I have to use OpenImageDenoise, NLM gives artifacts.
The SSS performance of the GPU seems to be mediocre but works without noticeable errors (this monster_under_the_bed demo works nicely). The SSS kernel needs the longest time to compile , the rest of the kernels is built in under 5 seconds each.
I only have the base 8GB model and of course big scenes bring the MacBook down, but it stays more snappy while rendering, compared to my 32GB i4790k with an 8GB Vega 56, which lags in OpenCL (Win10 and Big Sur ;).
@brecht
I understand that You will not happily re-enable OpenCL on the Mac, especially as long Apple tags it it as deprecated, but would you eventually reevaluate the situation in the not so distant future?
I continued with more testing and benchmarking. Not a single crash, but different speed gains or losses with or without GPU.
Terminal output for bmw27 OpenCl:
Fra:1 Mem:297.39M (Peak 315.39M) | Time:04:33.81 | Mem:638.78M, Peak:646.78M | Scene, RenderLayer | Finished
I0422 16:54:02.770961 301022528 blender_session.cpp:591] Total render time: 271.796
I0422 16:54:02.770993 301022528 blender_session.cpp:592] Render time (without synchronization): 268.257
This is GPU only and if I read the blender benchmark database correctly, we are positioned between a NVIDIA 1050 and a 1050 Ti running CUDA.
For me this leads to the assumption, that the OpenCL-wrapper from Apple works pretty well. AFAIK the raw GPU of the M1 should be in the ballpark of these two NVIDIA GPU’s and the real-world result does reflect this closely.
Given the similarities between OpenCL kernel and Metal Performance Shaders I doubt, that a native hypothetical cycles metal device would improve this performance by a lot.
I will add more infos later, but I already can say, that I found at least one scene, that gives slight errors on the GPU and volume rendering performance seems to be worse than on the CPU.
After more tests my enthusiasm has sunk significantly.
It is not the problem, that OpenCL does not work reliably an Apple Silicon. Only one of my personal blend files showed very little artifacts with hybrid rendering (minimal dark squares on the GPU) and I had not a single crash.
But the fun fact for me was, that after the initial test with the BMW scene (randomly chosen), I did not find a single scene, where CPU-only was NOT faster than GPU or GPU+CPU.
I do not have the technical knowledge to draw a conclusion from it, i.e. if it points to specific weak points of Apples GPU vs driver deficits. But for the moment, there is not really much to win.
The good thing is, that overall performance of the arm64-build of Blender is really snappy and the CPU rendering performance of the M1 is significantly superior compared to my old Haswell 4790. It is very usable and stable, even with only 8GB for my relatively small scenes.
I will recheck the situation after the next Big Sur update.
Basically the current OpenCL kernel implementation is being removed entirely. Performance issues are part of the reason. The way forward will be a Metal backed on macOS. There’s nothing specific we can announce regarding that, but probably it is just a matter of time.
I still have no idea, why this blend file runs so well with OpenCL on the M1, whereas not a single one of my other blend files has not CPU only as the fastest version.
Hi all, any more news on when GPU Cycles will be supported on M1 (or intel!) Macs? I think there is a huge pent-up demand from Mac users for this, who will be very grateful if Blender Devs can make this happen going forward.
I do not really have something substantial new, but here is what I have tried in the meantime.
My goal was to find the point, where I would hit a wall, while trying to port the CUDA-driver to METAL.
First step was pretty straightforward. Adding properties for a metal driver, getting a metal device from the OS.
Next step, building an empty metal kernel inside the blender build system and loading it successfully on render was a bit harder, but seems to work now.
Next on the list would have been the port of the CUDA-kernel (now GPU-kernel) and see if it would compile. This is where my story ended for now. Blender already uses a lot of macros and metal would need even more (i.e. for address space qualifiers, extra atomic types, …), which would pollute the generic driver parts a bit more. But the real showstopper for me have been c+±lambdas, which is a feature, that the newest metal version (2.4) simply does not support. Porting back cycles-x to an older c+±standard is probably not an option and code duplication only for metal does not sound much better.
There may be more obstacles further down the way, but in my serial approach, this is all I know for now (=believe to know).
Please remember, that I was for sure fare away from a working version and maybe even on the wrong path. But I am optimistic, that a person with more inside knowledge of cycles (= not me) could be successful.
Not bad for putting this together in your spare time.
By the way, have you heard the news that Apple is officially on board the Blender Foundation’s Dev Fund now? Michael Jones is working on a Metal backend for Cycles: ⚓ T92212 Cycles Metal device
He also had trouble with the lambdas, but he says it can be solved with function objects.
BTW, I am aware of functions objects in metal, but they have a slightly different API compared to usual C++ (if I am not mistaken) and while cycles-x is still a fast moving target, this was the moment for me to throw the towel in order to keep a sane amount of my spare time for other things.
I think, we‘ll see some results in the not so distant future.
To bridge the waiting time until the completion of the coming metal driver, I decided to do a final test with new M1Max (32GPU, 32 GB RAM), which was kindly handed to me by my boss.
I did the same simple OpenCL „port“ as before by using the latest commit before cycles-x was merged into master and simply turning on OpenCL in a hard coded way. I had to inline one kernel function by hand, because the OpenCL compile later complained about incompatible pointer types. The rest was vanilla blender code. The comparison value was the master version from today.
So here are some numbers, that should not be taken out of context.
monster under bed:
master CPU 10:13.99
last_opencl GPU 23:31.98
last_opencl GPU + CPU 17:52.14
bmw:
master CPU 3:13.34
last_opencl GPU 1:07.19 (default)
last_opencl GPU + CPU 1:02.21 (96px) quite variable
classroom:
master CPU 7:31.87
last_opencl GPU 13:31.03 (256px)
last_opencl GPU + CPU 11:51.23 (256px)
last_opencl CPU 8:28.86 (pre-heated)
junk shop:
master CPU 1:07.63
last_opencl GPU opencl compile freeze, MTLCompilerService slowly growing
last_opencl GPU + CPU
last_opencl CPU 1:23.48 (pre-heated)
Non of these measurements are intended to be compared to public benchmark data bases. This makes no sense.
My few conclusions are:
It was the right decision to turn off OpenCL for MacOS. Without further optimizations the „old“ blender OpenCL code does not perform well in most of the cases, that I could test.
When enabling CPU+GPU we have a pretty good torture test for the cooling system and the 14-inch-MacBook is really loud after a few minutes. Even if it would run optimized code, such a MacBook is simply not built for heavy number crunching for extended periods of time. For my personal use, this will not become my main computer.
General performance of the M1Max in blender is very good and snappy. As long as it not used as a 24/7-rendering-workstation, You will probably not be disappointed.
The again good performing BMW-benchmark shows the potential and I am really curious, what we will see with metal. At least I have some kind of baseline now.