Cycles AMD HIP device feedback

Alaska · December 22, 2024, 12:01am

Sadly we had to disable point cloud rendering in HIP-RT (what is probably used for particle rendering in the scene you’re testing) as it brought a 3x performance regression to ALL scenes, making it significantly slower than normal HIP.

We are aware of this issue and apparently AMD is working on making some adjustments to HIP-RT, and potentially Cycles to fix this issue in a future release of Cycles.

We are also aware of a bunch of other issues with HIP-RT. And this is why it is disabled by default and is labeled as an experimental feature.

Mikolaj_Neronowicz · December 31, 2024, 12:50pm

Unfortunately. Despite great progress and surprisingly fast rendering in many situations, for rx 6600 xt, HIP is still an experimental feature. Many demo projects from the official website don’t run on Linux. E.g. the one with the glacier and boat, the one with dinosaurs and one with interior visualization. I’m talking about the ones I’ve tested. The program, after trying to preview, just shuts down. Barbershop takes a very long time to load. I thank the developers for improving the situation, but I think Hip should still be officially recognized as an experimental feature and not tested thoroughly enough. I’m still keeping my fingers crossed for AMD.

Alaska · December 31, 2024, 12:57pm

I’m not aware of most of those issues with HIP on Linux. I’ll double check here with my GPUs.

I think I know what that one is about. HIP uses the BVH2 BVH, and it can take a long time to construct the BVH2 BVH in “viewport mode” in that scene due to the number of objects and scene settings.

Alaska · January 1, 2025, 9:36am

The Glacier and boat scene (I assume you mean Art Gallery — Blender Cloud) is probably running into this issue: #131773 - Linux HIP RDNA1 has issues with MNEE - blender - Blender Projects

With at least RDNA1 (and possibiliy RDNA2 based on your report), scenes containing Manifold Next Event Estimation can freeze when trying to render them.

I would still recommend you do a bit of your own investigation to see if it is actually the same issue.

The one with dinosaurs (I assume you mean Art Gallery — Blender Cloud). That scene has a lot of hair in it, and the BVH2 BVH used by HIP isn’t that memory effcient with hair. This results in that scene typically seeing a a 50+GB peak of memory usage during scene intialization. During rendering it uses ~17GB of memory, which exceeds the VRAM of your GPU, and may not work properly with the host memory fallback feature.

This isn’t a issue limited to HIP, it’s a issue with all backends that use the BVH2 BVH:

CUDA
Standard oneAPI
Standard Metal
And CPU (with some debug settings)

This has typically been “fixed” by using more effcient curve representations in the BVH provided by the GPU manufacturer (E.g. OptiX, MetalRT, and Embree). But sadly HIP-RT doesn’t (currently) offer a memory effcient curve representation, and I believe there may actually be a issue in Cycles that leads to “super memory ineffcient curve representations” when using HIP-RT.

The Interior visualization scene (I assume you mean Blender 4.1 – Lynxsdesign from here). I’m not 100% sure about this one.
It consumes a bit of memory due to the hair on the rug, but I’m not sure if it’s enough to cause issues for your system with the amount of RAM and VRAM you have.

Or mabe there’s some other issue with this scene, I don’t know.

Alaska · January 4, 2025, 6:26am

Just so things are clear for users about the current state of HIP and HIP-RT.

Both HIP and HIP-RT are known to have issues. HIP-RT more so than HIP, older GPUs more so than newer GPUs, and Linux more so than Windows.

Just to give some perspecitve on that, the Cycles module has a “standard” test suite. It tests many different areas of Cycles.
Thre is a more “complete” test suite, but it’s rarely run as it’s impartical to align the test results across many different devices.

At this current point in time:

My AMD RX 7800XT passes the standard test suite with HIP on both Windows and Linux.
My AMD RX 5700XT does not pass the standard test suite with HIP on either Windows or Linux, with Linux having 1 or 2 more failures. Luckily the issues are limited to a small area of Cycles, shader ray tracing (Bevel and shadow caustics).
Neither my RX 7800XT or RX 5700XT pass the standard test suite with HIP-RT on Windows or Linux with many test failures.

Along with that, different hardware also appear to have their own set of issues. Here’s a list of the issues I know about so far:

Instinct GPUs are not supported.
- Older Instinct GPUs have major rendering artifacts.
- Newer Instinct GPUs do not appear to have native texture handling and so the kernels fail to compile. This could be worked around, but was deemed not worth doing at this current point in time.
Depending on your HIP-Runtime version on Linux, rendering with 3D textures (like a OpenVDB file) can cause Blender to crash.
- This has been fixed in Blender 4.4, there are plans to backport this to 4.2 LTS.
On some devices, volumetric materials can render with artifacts with HIP on Linux. RX 7600 and RX 6800 have been confirmed to have this issue. Yet the RX 7800XT and RX 5700XT do not, so this issue is highly device dependent.
- Upgrading the HIP compiler seems to fix this, so there is a plan to upgrade for Blender 4.4.
Shader ray tracing (Bevel shader and Shadow Caustics) on HIP on older GPUs (E.g. RX 5700XT) can have rendering artifacts, or lock up Blender or the GPU driver
- There are plans to refactor the shadow caustics system, which as a side effect may fix the shadow caustics issue.
- Alternatively updating the HIP compiler or making adjustments to the code may fix the issue.
HIP-RT fails to render many scenes properly
- Point clouds do not render.
  - As mentionied in a previous comment, this was done to avoid a performance regression
- Curve motion is incorrect for motion blur and motion vector passes.
- Transparent shadows can render improperly.
- Meshes with subsurface scattering and with motion blur enabled, can render black.
- Some scenes will fail to start rendering if specific (rarely used) settings are enabled.
- There are precision issues with:
  - Large objects
  - Probably small objects
  - Objects with certain transformations
  - Rays that travel long distances (common in Orthographic view)
HIP-RT is less memory effcient than HIP with curves (And HIP already wasn’t that effcient)
And related to HIP, HIP OIDN only supports a small number of AMD GPUs.

For the HIP-RT issues, some of them are on the Cycles integration side, some of them are on the HIP-RT library side. I do not know for certain where each issue falls.

Almost all of the issues above are being tracked on the Blender bug tracker (and some may be on AMD’s internal bug tracker).

Just to quickly add a note. I am a triager, I see a lot of bug reports, and I personally have a wide range of devices I can use for testing. No GPU backend is perfect. CUDA, OptiX, HIP, HIP-RT, oneAPI, Metal all have their own set of issues. It’s just that HIP and HIP-RT are significantly more likely to have issues than the other GPU backends when testing with recently released GPUs.

Gilberto.R · January 4, 2025, 1:28pm

I believe you meant to say that neither passes the “complete testing suite”

Alaska · January 4, 2025, 1:34pm

In which sentence do you believe I should have put:

neither passes the “complete testing suite”

L_S · January 8, 2025, 1:46pm

2024 update comparing HIP’s share of GPU opendata tests (previous caveats apply). Nothing much has changed over the year.

Gilberto.R · January 11, 2025, 12:49pm

You said “My AMD RX 7800XT passes the standard test suite”, and then said
“Neither my RX 7800XT or RX 5700XT pass the standard test suite.”. So I think you meant neither passed the complete test suite.

Alaska · January 11, 2025, 2:35pm

The RX 7800XT passes the standard test suite with HIP, but doesn’t pass the standard test suite with HIP-RT. This clarification is already in the original comment.

silex · February 18, 2025, 2:50pm

Developers from Debian-AI team recently expressed interest in including Blender as a part of their CI to test if it runs properly on AMD GPUs. I kind of volunteered to gather basic information if this is possible and how it could work in practice. Then I will forward it to the Debian team.

Disclaimer - I’m doing this out of my own will and I am not speaking for Debian nor for AMD.

Here are links to the current state of the important parts of the Debian CI:

My first question is if there would be any interest from BI/BF in such tests?
What versions of Blender should be tested - other than the ones officially packaged by Debian?
Second question is about test coverage. What should be the final goal - full suite, GPU only (Cycles, EEVEE, Workbench), or Cycles only?
Are there any glaringly obvious obstacles that could prevent the setting of automated Blender tests outside of Blender Institute? Was this done before?
In a hypothetical scenario where such CI is operational how regressions, crashes and bugs should be triaged and reported? For example if there is a bug deep in HIP runtime - should it be filed directly in the official ROCm Github repository or should it be filed on here or in both places?

sergey · April 2, 2025, 3:47pm

Hi,

We are definitely interested in running Blender tests and benchmarks on variety of hardware. Currently we cover Apple/Metal and Linux/Nvidia, but we plan to expand it to AMD, Intel, and also cover NVidia. It does take a bit time from our side, but you can see status and progress there: #46 - Buildbot: GPU tests and benchmarking - blender-projects-platform - Blender Projects
This probably answers your first question.

Having more people running tests might improve quality and stability of final releases, but if its done by separate project or team I am not sure what could be more practical than treating it as regular reports submitted by users.

What versions of Blender should be tested - other than the ones officially packaged by Debian?

We are mainly focusing on bugs that can be reproduced in the official builds. We try our best working with the downstream people, but we rely on them to do initial investigation to ensure the issue is in Blender and not in some difference of the build/packaging.

And to ensure issues are fixed before release is done the testing should cover Beta stages and above of Blender.

Second question is about test coverage. What should be the final goal - full suite, GPU only (Cycles, EEVEE, Workbench), or Cycles only?

In an ideal world all renders should be tested, also don’t forget GPU compositor.
But there are all sorts of possible issues: texture filtering might be different on different hardware, or there might be some known GPU-specific failures in corner cases which we can not easily block list from the testing framework. So the reports needs to be read with some care.

Are there any glaringly obvious obstacles that could prevent the setting of automated Blender tests outside of Blender Institute? Was this done before?

Depends what you mean by that. Blender is used by some vendors to test drivers and SDK updates, and even we=hen implementing new features, and we have minimal involvement into it. But they are also closely participating in Blender development.

So the obstacles, additional load on our team, etc depends on how exactly you envision this testing to be implemented. If the process is implementing in the similar way then there are probably not that many obstacles.

However, if it is implemented in a way that some automated farm generates 10s of reports about hardware we don’t have access to, or using non-official Blender builds, or in an environment where we don’t have control of (drivers, SDKs do have bugs, and we are not really interested in maintaining all permutations of their versions) then it is quite a huge obstacle, and we’d probably say no thanks to this.

In a hypothetical scenario where such CI is operational how regressions, crashes and bugs should be triaged and reported? For example if there is a bug deep in HIP runtime - should it be filed directly in the official ROCm Github repository or should it be filed on here or in both places?

If its a Blender-side bug, or there is some actionable thing we can do to ease life of downstream developers it should be submitted to our bug tracker.

If the bug is in the compiler in a version we don’t use, or a bug in runtime it wouldn’t be very clear reason to have reports on our side (they wouldn’t be acitonable from our side). But communication about such findings in our communication channels could be a good idea.

Mikolaj_Neronowicz · April 11, 2025, 6:12am

Hi everyone,

After today’s kernel update to 6.14.1 on Nobara 41, ROCm has stopped working. HIP rendering in Blender and OpenCL in DaVinci Resolve no longer function. I tested it a while ago, and when I boot back into the previous kernel (6.14.0), everything works fine.

I’ve noticed that rocm-cmake and rocm-core are still on version 6.2.0, while the rocm meta-package is at 6.2.1. Blender shows the usual “this may take a few minutes” message when initializing HIP for the first time, but after the message disappears, rendering still doesn’t start — and sometimes Blender crashes entirely.

I know this probably isn’t Blender’s fault, but I thought it might be worth mentioning here as well. Has anyone else run into a similar issue?

My GPU is an RX 6800 XT.

Mikolaj_Neronowicz · April 11, 2025, 5:06pm

Hi all,

I wanted to report that after updating to kernel version 6.14.2, the issue with HIP not launching correctly on my RX 6800 XT remains unresolved. Unfortunately, I am still unable to get HIP RT to work properly with the latest kernel.

For context, HIP RT was working very well with kernel version 6.14.0, and I had no issues running it with my RX 6800 XT. However, starting with kernel version 6.14.1 and continuing with 6.14.2, the problem persists.

Has anyone else encountered a similar issue? Any suggestions or potential workarounds would be appreciated.

Thanks!

brecht · April 14, 2025, 11:05am

There are very few changes in kernel version 6.14.1, and nothing seems relevant. But there are a bunch of amd gpu changes in 6.14.2.

For example:

Are you sure it happens also in 6.14.1? Or maybe there are additional patches by the Linux distribution.

You could try reporting an issue in the amd drm bug tracker:

Issues · drm / amd · GitLab

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Cycles AMD HIP device feedback