Cycles AMD HIP device feedback

What sort of testing can be done by enthusiasts to help?

I’m sure AMD devs have access to every hardware combo…

But for anything else I have plenty of W7100s, RX 290X, and Vega 64 and Frontier cards I wanted to set up a little render farm with now that they have been replaced in their main rigs by RDNA2 equivalents.

I can’t use the RDNA2 devices regularly… so with that being said, I have 4 machines with 4 of each card listed above, ready to test whatever.

Note re: Windows: The 22.6.1 drivers seem to work fine for legacy ++ new devices… but the NimeZ 22.6.1 Split Kernel DCH drivers work better when using multiple card architectures.

It seems with Blender working with HIP for Vega on Windows means that testing can start for Polaris/Fiji (GFX8xxx) and then GFX7xx cards like the R9 290X?

“AMD FineWine”

1 Like

So now that prices are sort of normal again, i got an RX6800 XT and still have trouble getting HIP to work in linux with the most recent driver. The rocm-hip-runtime is installed along with a bunch of other rocm-named packages, yet Blender still whines about requiring RDNA and 22.10 driver (i have 22.20). I feel betrayed again…

1 Like

What’s the log if you enable the cycles debug log? --debug-cycles in command line

I0726 01:43:09.822212 42214 device.cpp:32] HIPEW initialization succeeded
I0726 01:43:09.822238 42214 device.cpp:34] Found precompiled kernels
HIP hipInit: Invalid device

For the record, this is lspci’s output:

01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73bf (rev c1)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28

To be clear, Polaris is not supported and not working at all for Cycles. What is unknown is if it will ever be supported, but I would guess it’s not likely. So I would rather not give false hope.

Thanks for the offer, but user testing isn’t really what we are missing to make improvements, at least not as far as I know. It’s really developer time to work on the implementation of the compilers/drivers and maybe the Cycles implementation.

Wild guess, maybe related to permissions?
https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/Prerequisite_Actions.html#d4376e641

There’s also some commands listed here to verify if the installation was successful:
https://docs.amd.com/bundle/ROCm_Installation_Guidev5.0/page/How_To_Install_ROCm.html#d126e5268

Correct, not likely to happen now.

Excellent guess, running blender as root made it show up. It appears to be working, on heavy scenes sometimes the screen goes black for a second and then the whole OS except the mouse locks up, but it’s farther than i’ve gotten with any AMD+Cycles combination in the past 11 years…

Find out who owns /dev/kfd. Your user needs to be a member of that group.

I added my user to the “render” group and that made it work. Seems like an oversight, isn’t that something the AMD auto-installer should’ve done? Nobody’s going to know to do this.

I did some stress-testing on the viewport render - looks like no matter the scene, if i go a little crazy with rotating the camera and moving things around there inevitably comes a point where the GPU cannot keep up, the fans are at 100% and the UI locks up. Then a brief moment where the entire screen will turn off for a second or two, the fans go back to idle RPM and the screen turns back on with a locked up UI where the OS becomes unusable and i have to restart. Can anybody else replicate this?

Is there any update on the hipTexObjectCreate bug? It wasn’t fixed in ROCm 5.2, and it apparently also affects Vega, which makes HIP risky and annoying to use on two of the three supported architectures.

1 Like

I don’t have final confirmation from the driver team (I’ve been asking) but it will hopefully be fixed in rocm 5.3.

However I do have to pedantically mention that only RDNA2 cards are “officially supported” from the AMD driver side (again something I don’t have direct control over). But of course we, the group enabling Blender support, are doing our best to enable Vega and RDNA.

2 Likes

Thank you very much! So maybe around September/ October if all goes well then?

Also, to make this clear, I very much appreciate your work and commitment, and I wish the ROCm team was as committed and had a lot more resources - they clearly need them. That’s just me rambling though.

1 Like

If your Linux OS enables journalctl you can run the following command to check the systemd for errors on the last boot (-b-1) with human readable times (-k)
sudo journalctl -k -b-1

The error should be near the bottom (use page down), in my case blender normally produces something like this when it crashes the GPU.

kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -2!
kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=145976, emitted seq=145978
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process blender pid 9433 thread blender:cs0 pid 9467

In my case google reveals a widespread issue with no clear resolution, it could be blender or mesa/drm or the hardware. Hopefully you will have a more specific error.

So I have a new install of Ubuntu 22.04 and amdgpu-install_22.20.50200-1_all.deb (I wiped the hd and started over.) Both blender-3.3.0-beta+v33.c49717a82473-linux.x86_64-release and blender-3.4.0-alpha+master.a98102e32eca-linux.x86_64-release give the following response when using --debug-cycles:
I

0728 18:57:57.237565 10717 device.cpp:56] HIPEW initialization failed: Error opening HIP dynamic library

Anything I can do to troubleshoot this before I give up?

There is good news, E-cycles/K-cycles support hip now, here is a simple rendering speed comparison.
gpu:6800xt
cycles:32s
ecycles:17s (The results are visually different from the cycles results)
kcycles:22s



The AMD HIP feedback thread probably isn’t the best place to make performance comparisons between Cycles, E-Cycles, and K-Cycles. It’s probably best you make those posts else where.

I assume you used one of the E-Cycles performance presets? Or it was automatically applied by accident?

I just thought I would explain what the differences appear to be to me.

  1. The number of light bounces appears to be reduced or the “fast GI approximation” setting has been turned on with a low number of bounces. In some scenes this will have a small impact, in others it’s noticeable.
  2. The scrambling distance multiplier has been decreased too much for this sample count and is introducing artifacts. It should be noted that a decreased scrambling distance multiplier typically results in an increase in performance when rendering on the GPU.

Out of curiosity I checked how HIP on Linux looks on OpenData, and either OpenData have some issues, or Radeon PRO W6800 had massive performance regression between 3.1 and 3.2.

On the surface this looks like Windows-Linux thing, but there is one Windows benchmark from 3.2 with the same regression as Linux.

1 Like

I got the exact same thing! @Nik @L_S Can I ask what GPU’s you both are using?
I’m using a RX6900XT
I opened the following blender bug report:
https://developer.blender.org/T100353

I mentioned a few posts above that im using an RX6800XT

I can confirm that the UI locks up for me too in the case you describe in the viewport with two blender instances, but what happened for me earlier was in a single instance. Not only that, but but it wasn’t just Blender’s UI that locked up - it was the whole OS, rendering the PC unusable without a hard reset. It doesn’t happen with every file, just the heavier ones from the examples collection on the blender website.