Cycles AMD HIP device feedback

What about Rocm 5.1.3 breaking HIP kernel compilation for all AMD GPUs with Blender-git 3.3 alpha?

Since we’re already at Bcon4, just a few days away from the scheduled release, and with no update from Sayak, looks like RDNA support is gonna slip again?

Edit: Or are we actually running into a bug in HIP itself? AMD seems to prepare another release, I see an activity spike on the repos, which usually happens right before an update.

What makes you think RDNA isn’t supported by Cycles HIP? Both RDNA and RDNA2 GPUs are officially enabled in Cycles HIP right now.

Yeah, but it’s still broken on Linux: ⚓ T97591 Cycles HIP error with image textures on Linux and RDNA1

It’s a driver side issue. So I don’t think any changes on the Blender side right now will help.

Not only that but AMD confirmed that RDNA will not be supported for ROCm so don’t get your hopes up. I enjoyed having this GPU in these crazy times but I’ve completely abandoned the idea of doing anything else other than gaming with it.

3 Likes

Some scenes do work just fine though, which means there’s clearly a workaround at least. But I don’t understand why they don’t cause Blender to crash. Junk Shop renders just fine, and looking at the log, I see many of the same hipTexObjectCreate calls that normally cause immediate segfaults, except they execute without issue in this particular scene.

@Luciddream . I work for AMD :wink: The driver team is only currently supporting RDNA2 for Blender with HIP. However, we’re doing what we can to make sure RDNA is enabled in Blender.

@wsippel I’m not sure what makes you say “clearly there is a workaround”. It’s just some scenes don’t cause the issue. It does seem to be in the hipTextureObjectCreate call, the driver team is looking at why its only on the RDNA cards.

1 Like

Well, I dug around some more to figure out why Junk Shop works. There appears to be a workaround, it’s just not a very convenient workaround: As far as I can tell, what’s causing the crash is textures with an odd horizontal resolution (odd vertical resolution is fine though). I made a bunch of test textures in different resolutions. 2048x2048, 4096x4096, 2048x4096, 768x4096 and 1024x1023 all worked, 2047x2048 crashed.

1 Like

That’s really interesting! Good find

Small correction, it’s not about even or odd horizontal resolution, the horizontal resolution has to be a multiple of 128. Tested 128, 256, 384, 512, 768, 1024, 1536, 1664, 2048 and 4096, all worked. Any horizontal resolution I’ve tested below 128 crashed, as did 192 for example (128+64), or anything else that wasn’t a multiple of 128. The vertical resolution can be anything.

2 Likes

Thanks I’ll add that info. Will greatly help us!

Just a heads up 3.2 doesn’t appear to work on arch any more with just the opencl-amd package, it appears it needs the rocm-llvm package to build a kernel the first time it runs (took 35 sec on a 5900X).

If you use arch and want to use repo.radeon use opencl-amd and opencl-amd-dev. Only downside is the 10 GB install size, however I think amdgpu-install users still have a 4GB install size?

@Luciddream
What do you think about separating openCL and HIP aur packages for those that don’t want to compile? It would also slightly better align with the amdgpu-install packages.

aur package purpose aur dependencies repo.radeon dependencies
opencl-hip-amd common to OpenCL and HIP current stuff from opencl-amd rocm-language-runtime, rocm-core, rocm-device-libs, comgr, hsa-rocr, has-rocr-dev, hsakmt-roct-dev
opencl-amd unique OpenCL stuff opencl-hip-amd rocm-opencl-runtime, rocm-opencl, rocm-ocl-icd
hip-amd unique HIP stuff opencl-hip-amd rocm-hip-runtime, rocminfo, rocm-llvm, hip-runtime-amd

No idea what to do with the other 6GB of files from opencl-amd-dev once you remove rocm-llvm.

Can I install rocm on the open source driver? Which one has better performance than the dedicated driver on the official website

@L_S I’m open to discussing it but lets transfer this discussion to the AUR package page. But, I doubt we can find a middle ground. Every user will have a different need for his GPU. The only easy way to cover all cases is to include the whole amdgpu stack (which is what opencl-amd and opencl-amd-dev does)

However it would be helpful to understand why Blender 3.2 needs rocm-llvm but the previous versions didn’t, so we can make the better decision.

I think amdgpu-install users still have a 4GB install size?

I’m not at my PC now but I think amdgpu-install is about 16GB+. Maybe someone using Ubuntu and amdgpu stack can verify.

It’s a problem with Arch’s Blender package I believe. I ran into the same issue, the official binaries from blender.org worked just fine with opencl-amd. I guess the official package ships with precompiled kernels while the Arch package doesn’t.

2 Likes

Thanks for the info you are 100% correct! I removed opencl-amd-dev and the official packaged blender 3.2 version renders in Cycles fine without kernel compilation (also deleted the kernel the arch packaged blender made).

Article by Michael Larabel :

2 Likes

Any news on Vega support and broken HIP kernels on Arch in 3.3 Alpha?

HIP on Vega linux hasn’t been addressed yet but as noted in the meeting notes, Vega (And Vega II aka Radeon 7) support is looking good for 3.3 on windows. We’ll push a change in a week or two.

1 Like