Cycles AMD HIP device feedback

Thank you very much! So maybe around September/ October if all goes well then?

Also, to make this clear, I very much appreciate your work and commitment, and I wish the ROCm team was as committed and had a lot more resources - they clearly need them. That’s just me rambling though.

1 Like

If your Linux OS enables journalctl you can run the following command to check the systemd for errors on the last boot (-b-1) with human readable times (-k)
sudo journalctl -k -b-1

The error should be near the bottom (use page down), in my case blender normally produces something like this when it crashes the GPU.

kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -2!
kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=145976, emitted seq=145978
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process blender pid 9433 thread blender:cs0 pid 9467

In my case google reveals a widespread issue with no clear resolution, it could be blender or mesa/drm or the hardware. Hopefully you will have a more specific error.

So I have a new install of Ubuntu 22.04 and amdgpu-install_22.20.50200-1_all.deb (I wiped the hd and started over.) Both blender-3.3.0-beta+v33.c49717a82473-linux.x86_64-release and blender-3.4.0-alpha+master.a98102e32eca-linux.x86_64-release give the following response when using --debug-cycles:
I

0728 18:57:57.237565 10717 device.cpp:56] HIPEW initialization failed: Error opening HIP dynamic library

Anything I can do to troubleshoot this before I give up?

There is good news, E-cycles/K-cycles support hip now, here is a simple rendering speed comparison.
gpu:6800xt
cycles:32s
ecycles:17s (The results are visually different from the cycles results)
kcycles:22s



The AMD HIP feedback thread probably isn’t the best place to make performance comparisons between Cycles, E-Cycles, and K-Cycles. It’s probably best you make those posts else where.

I assume you used one of the E-Cycles performance presets? Or it was automatically applied by accident?

I just thought I would explain what the differences appear to be to me.

  1. The number of light bounces appears to be reduced or the “fast GI approximation” setting has been turned on with a low number of bounces. In some scenes this will have a small impact, in others it’s noticeable.
  2. The scrambling distance multiplier has been decreased too much for this sample count and is introducing artifacts. It should be noted that a decreased scrambling distance multiplier typically results in an increase in performance when rendering on the GPU.

Out of curiosity I checked how HIP on Linux looks on OpenData, and either OpenData have some issues, or Radeon PRO W6800 had massive performance regression between 3.1 and 3.2.

On the surface this looks like Windows-Linux thing, but there is one Windows benchmark from 3.2 with the same regression as Linux.

1 Like

I got the exact same thing! @Nik @L_S Can I ask what GPU’s you both are using?
I’m using a RX6900XT
I opened the following blender bug report:
https://developer.blender.org/T100353

I mentioned a few posts above that im using an RX6800XT

I can confirm that the UI locks up for me too in the case you describe in the viewport with two blender instances, but what happened for me earlier was in a single instance. Not only that, but but it wasn’t just Blender’s UI that locked up - it was the whole OS, rendering the PC unusable without a hard reset. It doesn’t happen with every file, just the heavier ones from the examples collection on the blender website.

I use a 6800. Thanks for raising the bug report hopefully it gets resolved. However you will probably have to work out if its Blender or amd/mesa/drm causing the fault then produce a patch that resolves the issue then get in the ear of a developer to have it merged. good luck!

Was able to to get hip working with blender-3.4.0-alpha+master.a98102e32eca-linux.x86_64-release by following the instructions here. Getting some crashes in the shader editor at the moment:
Memory access fault by GPU node-1 (Agent handle: 0x7fd6f8b7f700) on address 0x41700000. Reason: Page not present or supervisor privilege.

Otherwise has worked pretty well over the last few days.

Just made a patch that patches in polaris the same way for vega on blender 3.4.0 alpha for linux

as linux can run rocm 5.2.1 (22.20.1) flawlessly on linux using this env var

ROC_ENABLE_PRE_VEGA=1

and hip works quite well on those cards (outside blender)

more info here : https://www.reddit.com/r/blender/comments/wrrq6l/so_close_yet_so_far/

yes the renders are broken , but this is a proof of concept ,
and also blender fault rather than a driver fault

YES it can be done , but the dev team just doesn’t want to .

3 Likes

I’m not sure how you came to this conclusion, a problem in the AMD compiler or driver seems much more likely. The same Cycles kernel code works for other architectures. Just because some other HIP applications work doesn’t mean there are no bugs in the driver.

I wouldn’t jump to that conclusion.

The Blender dev team has many, many things on their plate and they need to prioritize what do they do with their available time. Right now, with the available manpower they haven’t being able to enable polaris because that’d mean figuring out, writing and debugging a new device.

I’m sure they want to support as much card as possible, specially old ones, since that enables them to fulfill/comply with Blender’s vision that

“Everyone should be free to create 3D CG content, with free technical and creative production means and free access to markets.”

But truth is they can’t right now. But, probably, if there’s some community provided solution, they might include it (I can’t speak for them, but I suppose that).

1 Like

We don’t have to jump to conclusions, AMD and Blender have both stated a few posts up that its unlikely they will support Polaris.

Opendata shows some interesting history on the progression of Blender hardware. Keep in mind:

  • Opendata no longer allows 2.9 OpenCL tests
  • OpenCL tests aren’t exclusive to AMD hardware
  • Some Metal tests contain AMD hardware
  • Opendata lags Blender release by days or months
  • Command used to filter data, eg METAL users for July 2022:
    grep ‘“created_at”: "2022-07’ ./opendata.jsonl | grep -c ‘“device_type”: “METAL”’

It would appear AMD’s HIP implementation didn’t meet expectations and they lost user share compared to the old OpenCL implementation. I think the following items have contributed:

  • AMD being unable to provide hardware ray-tracing support in Blender for 641 days and counting since the RDNA2 release.
  • Polaris and Vega remaining unsupported 258 days and counting after Blender 3.0+ release
  • RDNA currently having bugged texture support 258 days and counting after Blender 3.0+ release.
  • AMD GPU hardware not being competitive in cost per performance when compared to other hardware vendors.
  • HIP and ROCM being relatively new software.
  • HIP and ROCM being generally disliked by the software community (probably applies to most software).
3 Likes

Vega support was added at the end of june. So that can be checked off that list.

Do you have a Vega and have you tested it? I have tested that Vega 64 support you claim to be there, and it’s a broken pile of garbage. Still getting the same error and there’s still no ETA on it getting fixed.

1 Like

I have Vega II but did not tested it yet. Still waiting for ROCm to land in Debian repo.

Sir, if there is still no driver and you can’t use it, then it is not supported, is it?

1 Like

I can compile ROCm from source but I don’t have time and will to do it.

I did compile rocm from AUR like ten times over the past 11 months, and I don’t have time or will to do it any more. The result is always the same.

1 Like