Cycles AMD HIP device feedback

Hi Brian.

Right now I haven’t got clear clues on whats wrong about gfx803.

I have to say it is so plainful for testing gfx803, because of lacking debug feature. I almost spent one month to test gfx803 with ROCm-5.2.3, and it always report Mem Fault error. At I nearly accept the fact and give up. ROCm-5.3 released, The mem fault disappeared. Thank godness, Finally I got a kernel_gfx803.fatbin just can render scene_cube_surface.xml properly.

What I modified is replace all noinline to inline, and yes, then image turn 10 times larger than normal, also the compiling time.
It seems like there is something fixed in kernel or llvm for ROCm-5.3. But it really difficult to find out.

It is really appreciate if there are documents or tools for debuging old card, like gfx803, polaris. Or, could AMD provides training for debug amdgpu assemble codes?

I will do some test next week.

2 Likes

I test blender-3.3.1 with ROCm-5.3.0 and RX580 (gfx803).
It can render successfully on Ubuntu-20.04.5.

I had upload pre-build binary to github. Someone who interested could have a try.

7 Likes

Nice job @Xu-Huisheng ! The ROCM team isn’t officially supporting Polaris AMD ROCm Hardware and Software Support Document - AMD Community, and up to this point it hasn’t worked in blender, which is why we haven’t enabled it. And would be interesting to try this on windows as well.

All that being said I think this is great to have as an option for users to compile themselves.

2 Likes

@bsavery
It will be more convenience if there is a parameter to control whether we can select gfx8+ or gfx9+ (default) on cycles.
Right now, the enabled version range is hard coded in the util.h, I have to compile whole blender and distribute this about 200MB package.
If we can enable gfx8+ with some environment variable, I can just upload kernel_gfx803.fatbin to other person.

Actually, I didn’t have a windows development envrionment, cannot build windows blender version for gfx803 and test. If there is a option to enable gfx8+ without rebuild whole blender, I can do some test too.

2 Likes

I’m ok with it, if it’s hidden behind an env variable but it’s a bit up to @brecht so I’ll defer to his thoughts here.

1 Like

ROCm 5.2.3-1 landed in Debian unstable repo, so I gave it a try.

AMD documentation states: **NOTE**: This release of ROCm is validated with the AMDGPU release v22.20.1.

Which should be enough as Blender requires at least 22.10.
Unfortunately Blender is not detecting GPUs (Vega II) so far:

I don’t know if this is because Debian unstable have 6.0 kernel, and ROCm does not like that, or is it something else. I don’t expect ROCm to work OOTB with unstable and unsupported Linux distro. But I can provide all necessary system info if needed to help with enabling Blender on Debian with packaged ROCm drivers.

2 Likes

We could change the logic so that if a binary exists for the architecture, it uses that without checking if the architecture is supported. I don’t think it needs an environment variable.

3 Likes

Has the checking logic been changed ?

Came across this … OneAPI has CYCLES_ONEAPI_ALL_DEVICES=1
Is there anything like this for HIP ?

For HIP we’ve always only supported the specific architectures that we ship binaries for, that hasn’t changed.

For OneAPI this all devices option is possible because there is runtime compilation for different architectures, which we don’t have for HIP at the moment.

1 Like

I meant this change " We could change the logic so that if a binary exists for the architecture, it uses that without checking if the architecture is supported. "

No, it has not been changed.

1 Like

Recently found these issues with Vega 64

• Using CPU + GPU taking up to 5x longer to render (comparison to Only GPU i.e.vega 64)

•Enabling viewport denoiser make viewport very laggy and render very slow both on GPU and CPU.

currently using windows 10 with latest drivers 22 q4
CPU - Ryzen 7 2700
GPU - Rx Vega 64

Didn’t submit any reports because I don’t know if it is with my system or its actually a bug.But I have tested it with 22 q3 and 22 q4 both are giving the same results.

Also didn’t included any file because you can test it by just draging a asset with some image textures on it.

Added some test(on Blender 3.3.1 LTS)

With CPU+GPU it take around 00:04:36 to render

With GPU only it takes around 00:01:33

Kindly have look into it @brecht @bsavery

2 Likes

There are some known performance issues with CPU + GPU rendering. I would not expect 5x slower, that would be worth investigating.

Viewporting denoising is relatively slow on AMD GPUs because it’s using OpenImageDenoise on the CPU. There is only GPU accelerated denoising with OptiX on NVIDIA GPUs. GPU acceleration for OpenImageDenoise is being worked on by Intel.

cpu and gpu can’t mix rendering, only block rendering, and cpu priority is higher than gpu, cpu+gpu option = cpu rendering

Since it doesn’t look like HIP will ever be possible on older cards like hawaii gpus and such any chance that the AMD-flavoured dev going into HIP might spawn a spin-off series with Vulkan?

I posted awhile back on bsaverys github repo for rprblender that whatever was done to rpr made it render better and faster on my r9 390 than anything prior including cycles opencl , is it possible that some of that magic can be implemented alongside HIP as an alternative so Cycles can be used again? or is the sauce just too difference despite AMD code being involved with HIP.

1 Like

It is possible for a developer to go in an add a Vulkan rendering backend for Cycles, however I believe this is unlikely to happen or be accepted into the Cycles code base. See Brecht’s response to my question about implementing Vulkan here:

1 Like

I realize this is sacrilege in the intel HIP thread but:

At BCon22 I saw a presentation by an Intel guy. The version of blender on Intel’s OneAPI he showed off could run on intel ARC, NVidia (via a oneapi CUDA backend) or AMD (via a OneAPI OpenCL backend).

I’m not sure if all those backends are publicly available yet. And also not sure if it buys you anything on older hardware, because OpenCL support alone is not enough, the GPU needs enough register, local memory etc.

But still it might be interesting to watch.

/ducks and runs

1 Like

Not likely.

So the way that was working is piping the C++ through the HIP compiler or CUDA compiler in the case of NV, and translating the relevant library calls. It wouldn’t really give any advantage or hardware compatibility fixes that weren’t already in HIP.

Where it is interesting is on the developer side where you could have one unified backend for CUDA/HIP/OneAPI, but I’m not sure it would have a direct effect for users.

As far as I understood it OneAPI had an OpenCL backend. OpenCL predates HIP, no? I used OpenCL on AMD cards before I ever had heard of HIP, but maybe I’m just uninformed.

But probably those older cards don’t have the needed hardware capabilities anyway.

I don’t know enough about OneAPI to comment if there’s a OpenCL backend, but there was no OpenCL involved in the demo they were showing at BCon. At least on the AMD device.