Cycles AMD HIP device feedback

So opencl-amd got updated to 22.20 where HIP is version 5.2, recompiled blender-git 3.3 on Linux, still fails to render with materials and blender crashes on Vega 64.

Previously I used scenes with different settings because I needed the rendered images fast.
Now I used the default settings, besides changing renderer to cycles, lowering samples on some cases and letting auto tiles enabled.
I tested with scenes that don’t use much VRAM, to avoid maxing it out and increasing rendering times unnecessarily.

3.3a 2.93.9
scene cpu gpu gpu+cpu gpu gpu+cpu
BMW 03:30 01:03 01:48 01:51 01:08
Classroom 07:04 02:35 02:43 03:01 02:29
Wanderer 01:14 00:37 00:51 00:34 00:32
Nishita Sky Demo 01:07 00:29 00:41 00:40 00:39
Lone Monk (256 samples) 03:47 01:01 01:38 01:44 01:32
PartyTug (100 samples) 02:59 01:25 02:14 01:41 01:11

My conclusions:
HIP is way faster than cpu when enough VRAM is available.
HIP is faster in most cases when comparing gpu only.
Mixing is broken and should be avoided for now.
HIP has about the same performance as opencl with cpu and auto tiles, but it works on viewport.
Compiling kernels time won’t be missed.

3 Likes

Re-built Blender 3.3 git alpha from latest source with updated mesa, clang and llvm and was able to render junk yard scene on Linux Vega 64 with proprietary opencl-amd in 30.16 seconds. However my test project with materials still fails to render.

Error: Invalid value in hipTexObjectCreate(&cmem->texobject, &resDesc, &texDesc, __null) (intern/cycles/device/hip/device_impl.cpp:1099)

Then it looks like this might actually be the same issue RDNA1 owners face.

I found out that trying to render an image or bake a texture, after a viewport rendering is complete, causes system to crash.
It happens on scenes that need lots of memory, like junk shop.

@brunocb , bug reports should be made to the Blender developers bug triaging site: https://developer.blender.org/

@bsavery I feel bad for asking, however has there been any updates to HIP-RT? I think there was a brief mention of June/July for more info, but totally understand these things take time and timelines get postponed (or canned!)

So when are you guys going to fix RDNA1/Vega texture/material bug?

Completely reasonable , but there is an easy fix :

Change

${OS} support and stability
RDNA2 [yes]
RDNA [no]
VEGA [yes]

to :

${OS} support
RDNA2 [yes]
RDNA [yes]
VEGA [yes]
Polaris [?]

${OS} stability
RDNA2 [yes]
RDNA [no]
VEGA [yes]

.

and thus users will know that polaris is supported , but it’s not stable , or officialy supported or guaranteed to work in blender .

2 Likes

What sort of testing can be done by enthusiasts to help?

I’m sure AMD devs have access to every hardware combo…

But for anything else I have plenty of W7100s, RX 290X, and Vega 64 and Frontier cards I wanted to set up a little render farm with now that they have been replaced in their main rigs by RDNA2 equivalents.

I can’t use the RDNA2 devices regularly… so with that being said, I have 4 machines with 4 of each card listed above, ready to test whatever.

Note re: Windows: The 22.6.1 drivers seem to work fine for legacy ++ new devices… but the NimeZ 22.6.1 Split Kernel DCH drivers work better when using multiple card architectures.

It seems with Blender working with HIP for Vega on Windows means that testing can start for Polaris/Fiji (GFX8xxx) and then GFX7xx cards like the R9 290X?

“AMD FineWine”

1 Like

So now that prices are sort of normal again, i got an RX6800 XT and still have trouble getting HIP to work in linux with the most recent driver. The rocm-hip-runtime is installed along with a bunch of other rocm-named packages, yet Blender still whines about requiring RDNA and 22.10 driver (i have 22.20). I feel betrayed again…

1 Like

What’s the log if you enable the cycles debug log? --debug-cycles in command line

I0726 01:43:09.822212 42214 device.cpp:32] HIPEW initialization succeeded
I0726 01:43:09.822238 42214 device.cpp:34] Found precompiled kernels
HIP hipInit: Invalid device

For the record, this is lspci’s output:

01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73bf (rev c1)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28

To be clear, Polaris is not supported and not working at all for Cycles. What is unknown is if it will ever be supported, but I would guess it’s not likely. So I would rather not give false hope.

Thanks for the offer, but user testing isn’t really what we are missing to make improvements, at least not as far as I know. It’s really developer time to work on the implementation of the compilers/drivers and maybe the Cycles implementation.

Wild guess, maybe related to permissions?
https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/Prerequisite_Actions.html#d4376e641

There’s also some commands listed here to verify if the installation was successful:
https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/How_To_Install_ROCm.html#d126e5268

Correct, not likely to happen now.

Excellent guess, running blender as root made it show up. It appears to be working, on heavy scenes sometimes the screen goes black for a second and then the whole OS except the mouse locks up, but it’s farther than i’ve gotten with any AMD+Cycles combination in the past 11 years…

Find out who owns /dev/kfd. Your user needs to be a member of that group.

I added my user to the “render” group and that made it work. Seems like an oversight, isn’t that something the AMD auto-installer should’ve done? Nobody’s going to know to do this.

I did some stress-testing on the viewport render - looks like no matter the scene, if i go a little crazy with rotating the camera and moving things around there inevitably comes a point where the GPU cannot keep up, the fans are at 100% and the UI locks up. Then a brief moment where the entire screen will turn off for a second or two, the fans go back to idle RPM and the screen turns back on with a locked up UI where the OS becomes unusable and i have to restart. Can anybody else replicate this?

Is there any update on the hipTexObjectCreate bug? It wasn’t fixed in ROCm 5.2, and it apparently also affects Vega, which makes HIP risky and annoying to use on two of the three supported architectures.

1 Like