Cycles AMD HIP device feedback

Maxzor · December 14, 2021, 5:30pm

You are right, the source is here (all of ROCm?).
This is all very misleading from AMD.

To install HIP, and ROCm in general, on, say, ubuntu, you currently need the installer provided by AMD, often referred-to as “amdgpu-pro” (technically you should be able to compile your ROCm, but I am not aware of anybody doing this^). The amdgpu-pro package bundles both open-source and proprietary pieces of software, and in the lot, there is ROCm+HIP. As far as I understood, installing ROCm software does not need a kernel driver other than amdgpu.

Uninteresting dive into the details

P.S. I saw that AMDKFD kernel driver was said to be merged in AMDGPU around 2018, yet this repo still maintains a linux downstream kernel with commits talking about AMDKFD oO… well looks like the upstreaming pipeline… but quite lagging, the base kernel is 4.13, from 2018! Quite a big rebase to be done… but since it’s a kernel mod musn’t matter much if the dmks API is stable. Right.

It’s quite sad that most distributions have not started packaging it themselves, I checked today and ROCm is not even in the debian RFP (request for package) list ; I’ll maybe look into becoming a sponsored package maintainer .
P.S. First step done https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1001712

FogLizard · December 15, 2021, 4:31am

Rocm is available for Arch Linux in AUR, so you can easily compile it, and I did. You are correct, It detects my Vega and it’s features fine, but the issue is that devs have disabled any GPU older than Navi in the Blender code. Even if I enable Vega arch in the code and compile Blender 3.1 alpha from git with Vega binaries, Blender crashes when I enable HIP because devs haven’t worked on Linux support yet. It’s been over 2 months of focusing on Windows and Navi, so who knows when we’ll see HIP on Linux, especially when few people have Navi GPUs while the majority of AMD users have Polaris/Vega cards. The whole approach is so backwards, and I don’t even know, nobody cares.

2905710881 · December 15, 2021, 6:15am

I remember that amd’s pro driver does not have an Arch driver

FogLizard · December 15, 2021, 6:16am

I’ve been using OpenCL implementation from AUR ripped from pro driver with the open source kernel driver with Blender 2.8 just fine.

EDIT:
I guess eventually I’ll just end up investing in like a 16-core CPU, because at this rate it will still be times cheaper than getting a Navi2, while there’s Navi3 on the way, which we won’t probably see in retail at affordable prices too anyway. AMD will sell a lot of them to happy miners tho for sure, who are also using Rocm on Linux btw.

L_S · December 15, 2021, 7:04am

The people at rocm-arch github are doing a great job. I’ve had luck with fresh installs of “hip-runtime-amd” but no luck with upgrades, also it takes a loooong time to compile just to find out it doesn’t work.

Eventually when HIP is patched, to make CyclesX more accessible (no compilation required) on Arch an AUR package can be created that is basically “hip-runtime-amd” but instead of pulling from github and compiling uses the precompiled binaries from Index of /rocm/apt/4.5.2/pool/main/h/hip-runtime-amd/ and its 10ish dependencies like “opencl-amd”. Not sure if its the best way forward, I’ll message the “opencl-amd” AUR to see what they think.

brecht · December 15, 2021, 2:30pm

You seem to be equating what has been working in those 2 months with what developers are spending their time on in those same 2 months, that’s not a correct assumption.

The thing here is that we are dependent on fixes in the driver/compiler with their own developer teams, that are dealing with more than just Blender. There is nothing (that we know of) to be done on the Blender implementation to make these work.

FogLizard · December 15, 2021, 8:26pm

I am sorry, but let me get this straight.
First all of non-Navi arches were disabled in code because Blender was crashing during build. I have selectively enabled my Vega arch on Arch Linux and it built successfully.
Second, I have enabled HIP and it compiled with binaries without errors, and I’ve got the HIP option in preferences.
Third, Blender crashes when HIP is enabled on Linux. What do you mean “nothing to be done”? Has anyone looked into the crash I’ve reported? Has it been reproduced? Has the root cause been found? Has it been reported to AMD/rocm team/open source/pro/ whatever driver team? No, there is apparent lack of care and communication. How is whoever is responsible for this vague “driver” you keep mentioning is expected to guess what has to be changed/implemented for HIP to work on Linux if Blender devs are clearly not doing any work for Linux HIP support at all at the moment?

brecht · December 15, 2021, 9:01pm

That’s not the reason Vega architectures were disabled, they were disabled because they fail at runtime. There would be no point building and shipping binaries that fail when you try to use them.

Yes, that part works but it doesn’t mean rendering works.

There is an issue in the HIP runtime or somewhere in the AMD driver below that. AMD developers that worked on the Cycles HIP integration are aware of this, have reproduced it, and are working with the AMD HIP / driver teams to get this resolved.

Blender development you can follow in the open. But of course neither you or me can follow internal AMD communications, all we can say is that it is being worked on for 3.1 as we did in the release notes and this post. It’s exactly the same when we work on such things with NVIDIA, Intel or Apple.

Maxzor · December 15, 2021, 11:00pm

The situation of ROCm is messy in a lot of places first of which at AMD, also at Debian which trickles down packages towards Ubuntu later in the pipeline.

But there are steps in the right direction.
At AMD they seem to have gotten aware that packaging themselves was not the best idea; that having three different github orgs for ROCm was not the best idea either…

so they are starting to work with the official maintainers and reorganizing their codebase. With the goal of putting “ROCm one ‘apt install rocm’ away from users”.
There is a dedicated debian rocm team on the debian-flavored gitlab: AMD Yes! ROCm Team · GitLab, and they can also be reached at [email protected] .

bsavery · December 15, 2021, 11:26pm

We are working on the runtime errors on the AMD side since it involves changes to the driver and testing it by enabling HIP in blender for linux builds. As Brecht said all of our internal development is not as open communication as Blender is. As I’ve said before we’re planning to have this enabled for Blender 3.1. Please be patient and trust this is being handled.

L_S · December 16, 2021, 4:58am

Is it possible to mention what part of the open source HIP drivers caused the runtime error in the Linux driver stack?

I find this fascinating, To use a race as an analogy bystanders have seen Linux stopped just meters from the finish line, meanwhile Windows has come from behind and won the race. Linux coming second is fine, but then the bystanders are shocked as the race announcer states Linux may remain in place for months…

FogLizard · December 16, 2021, 7:33pm

We can be patient, though it’s not fun seeing my AMD Fine Wine turn into AMD Vinegar just like that.

Andrea_Monzini · December 21, 2021, 12:57pm

Hello,
what about HIPSPV ?

It looks interesting for wider GPU support with the universal and open SPIR-V:

Thank you.

brecht · December 21, 2021, 2:16pm

Similar concerns as clspv that I explained here.

Early technology, likely to have bugs and problems handling more complex kernels
Unclear when hardware ray-tracing will work with this, if ever
Can not expect much support from hardware vendors when straying from the API they recommend for writing production renderers (and use themselves for similar purpose)
More abstraction layers (HIP → LLVM → Vulkan) is more chance for bugs, harder to debug, and solving certain limitations may require changes at all levels which is very hard to coordinate

That being said, I would love to see a future where compiling GPU code is as simple as CPU code and it all can just be done with LLVM. But right now the GPU front/backends in LLVM to me seem more at a level where you need an LLVM engineer contributing to the project to get things working rather it being a compiler you can just use.

Andrea_Monzini · December 21, 2021, 2:35pm

Thank you for the answer,
i hope we will see in future an open standard implementation for GPU compiling in particular for who like to support free and open source software ( OS systems and GPU driver included ).

bsavery · December 21, 2021, 2:59pm

brecht:

Similar concerns as clspv that I explained here .

Early technology, likely to have bugs and problems handling more complex kernels

Unclear when hardware ray-tracing will work with this, if ever

Can not expect much support from hardware vendors when straying from the API they recommend for writing production renderers (and use themselves for similar purpose)

More abstraction layers (HIP → LLVM → Vulkan) is more chance for bugs, harder to debug, and solving certain limitations may require changes at all levels which is very hard to coordinate

That being said, I would love to see a future where compiling GPU code is as simple as CPU code and it all can just be done with LLVM. But right now the GPU front/backends in LLVM to me seem more at a level where you need an LLVM engineer contributing to the project to get things working rather it being a compiler you can just use.

HIPSPV wouldn’t really add more here other than removing code duplication maybe if it worked everywhere.

BTW this is getting off topic but I 100% agree about the compiling GPU code via LLVM. I recently came across this project called “Taichi”. Which lets you take Python code and compile to run on Metal / Vulkan / CUDA. Think there are similar projects for Rust, and other languages. Pretty cool. I used that to write a simple path tracer that runs in blender all in python. Better handled in a separate thread on blender artists: An open source GPU renderer for Blender written in Python - #10 by bsavery - Blender and CG Discussions - Blender Artists Community

Maxzor · December 21, 2021, 4:39pm

Another contender in the GPU programming platforms that’s sicker than the javascript ecosystem!

2905710881 · December 22, 2021, 5:24pm

Is Rust really the star of the future?

Maxzor · January 8, 2022, 12:45pm

@brecht this is very early work but here you can find a hip .deb package that could be integrated in the future in debian:stable. Is hip all that is needed for Cycles compute kernels?

I am in the process of packaging the higher-order user-facing libraries, such as rocRAND, rocBLAS and higher (rocFFT, rocSPARSE, rocTHRUST…) too, but it will take even more time.
I first had to unbundle the LLVM fork by AMD and am currently trying to replicate qualification testing with the llvm of the distribution… you can find more detail in this github thread, or in the mailing list. Made a dummy docker builder for the packages here.

P.S. Any news on AMD HIP bug fixing?

brecht · January 10, 2022, 1:31pm

Probably that’s enough but I’m not familiar with the details of this. We don’t use any of those higher level libraries in any case, just libamdhip64.so.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Cycles AMD HIP device feedback