Cycles AMD HIP device feedback

Because HIP isn’t working well at all right now. Move to Blender 2.9.2 see if opencl works better for your hardware.

But it was written that HIP supports RX6000 series GPUs. Open CL works good but its older versions and not Cycles X. HIP crashes and dont work at all

1 Like

Linux Testers. TLDR: Latest Blender 3.2 nightly builds should work with ROCM 5.1 driver now.

Long story: There was a simultaneous change on the blender side that affected something in our compiler. We’re working to get that fixed and hoping to get Vega cards working for 3.3

On the HIP RT (hardware ray tracing) side, as readers of this thread saw we released the library, look for more info and definite date of this integrated to Blender in June / July.

3 Likes

It’s still crashing for me on ROCM 5.1.1 (opencl-amd) and 5700XT. I can only render a cube but that takes about 20 seconds.

edit: I assume you mean it should work with the upcoming ROCm release so crashing right now is normal.

Brain, I was wondering if you could clarify some things about the Linux implementation of HIP to clarify things for users and for assistance in writing the documentation about Linux support.

Let’s assume a user has downloaded a version of Blender for Linux compiled with HIP. And they have an RDNA2 graphics card that’s supported. What do they need installed for Cycles HIP to work?

In the commit message for official Linux support it says

This requires the 22.10 / ROCm 5.1 driver.
Source: rB179100c02126

I assume this means you need at least AMD Radeon Software 22.10?
What about AMD Radeon PRO Software? Does any version of that driver support Cycles HIP on Linux? Or will a future release support it?
Is there anything else I’m missing?

Am I mis-understanding something? Do you need both the AMD Radeon Software 22.10 and ROCm 5.1 packages installed for Cycles HIP to work?

1 Like

Just tested the 3.2.0 alpha (179100c02126) it crashed the display driver and X the first time but worked the second time… I didn’t try a third.

System Details
OS - 5.17.1-arch1-1
Drivers - Mesa 22.0.1 with propietary ROCM 5.1.1 from AUR opencl-amd package
GPU - RX 6800 (gfx1030)
Scene - Classroom
Tile Size - 2048
Samples - 300

3 renders with HIP on 3.2.0 alpha (179100c02126) got times, 00:46.04, 00:46.12, 00:46.14
3 renders with OpenCL on 2.93.9 (31712ce77a6a) got times, 01:35.81, 01:35.51, 01:35.70

If you want to compare apples to oranges I compared to the old 3.0 benchmarks from techgage. Running classroom at 2560×1440 with 256 tiles I got about 77 sec on HIP Linux which is fairly similar to the 85 sec for HIP windows considering 3.0 is nearly 4 months old at this point.

Congratulations to AMD and Blender, AMD Linux users will soon benefit from CyclesX as Windows users have. Hopefully HIP RT can be delivered in a timely manner resulting in competitive RDNA2 GPU’s.

1 Like

Did a quick test using ROCm 5.1.1 from rocm-arch, with some of my own scenes and the BMW scene. Doesn’t work, address boundary error while preparing textures. Probably caused by the bug bsavery mentioned. An untextured test scene I quickly threw together using Curtis Holt’s material test object and his procedural candy shader did work: 9’13" CPU, 2’46" GPU. Quite an improvement!

1 Like

Very nice, finally working on Linux, only funny thing is that CPU won´t add any performance on the rendering despite being used at 99% most of the render time

I windows platform 6800xt clock boosted to 2600mhz can only 1 minute 32 seconds to complete 300 samples, 6800 default 46s, linux this is equivalent to the performance of windows after opening Ray Tracing almost

6800xt theoretical performance at least 20 percent faster than 6800,46/1.2 equal to 36, linux is almost 2.7 times faster than windows rendering speed, reference cpu in linux speed 25-30 percent faster, that is 36/97/1.3 equal to about 2, that is to say, windows should be twice as fast to be a reasonable speed?

Just tested the alpha (179100c02126) on Ubuntu 22.04 LTS. Viewport rendering is fine but as @L_S has said, an actual render crashes the display driver and X - even with just the default cube scene.

I’m not sure if this is the right place to give feedback. And this have probably been mentioned before. Sorry, I didn’t real all 225 comments.

I’m now running GPU on my iMac Pro Vega 64. I see there’s an option to render with GPU or GPU and CPU in the metal prefs. I though adding the CPU would make a big difference but for render time it’s negligible. But for the memory, If I look at the peak, GPU only took twice as much memory. I don’t understand why.

I was rendering the lone monk demo file.

GPU only: 1271.67, peak 1271.68
CPU and GPU: 479.01M, peak 603.59

This 6800XT got about 40 seconds with HIP, in classroom.

I think Windows and Linux HIP are the same or similar in rendering speed.

I tested the windows environment many times, this scene 300 times the sample is indeed 1 minute and 32 seconds, cpu:5900x data in 4 minutes and 9 seconds, linux if can be completed in 40 seconds that means fast 2.5x

1 Like

Why is linux so much faster than windows?

Look at this ratio even windows hip may not beat linux opencl

Funnybob, the information found below is still relevant for you. However, for future reference feedback on Metal should go the Metal feedback thread. Cycles Apple Metal device feedback - #340 by S.I

Cycles-X in it’s current form less than ideally schedules work between multiple devices of varying performance in scenes with varying complexity across the frame. As such, enabling CPU + GPU usually doesn’t give people the performance increase they expect.

Investigation is underway to fix this. But until that fix is implemented, that is how Cycles-X works at the moment.

1 Like

Can we render the animation with separate frames? Even if it’s only 1/5 faster, it saves 1/5 time. The reality is that even if you open 2 software, the previous gpu rendering will stall when the new project chooses cpu rendering

I think that is to be expected. I don’t know about much about HIP or blender’s use of the GPU, but I do have experience with doing GPU calculations in general.

Even if you choose GPU rendering, the cpu is still very busy for part of the render to prepare all the data for the GPU. so if the cpu has to be shared with another process you lose a lot of GPU speed as well while the GPU idles waiting for the CPU to prepare the next batch.

Maybe you could try to lower the priority of the second (cpu-only) instance so that it only uses the ‘left-over’ cpu power?

(But this getting rather offtopic, so I’ll stop here)

1 Like

Did a few more tests, Blender always crashes immediately if I try to render anything with an image texture. And it’s not caused by the open source ROCm platform as I initially suspected, it fails the exact same way with the official ROCm 5.1.1 binaries. I also tried multiple kernel versions (5.17.4 zen and xanmod, 5.15.35), same result:

:4:rocblit.cpp :460 : 1532640456 us: 6847 : [tid:0x7f980f9eb640] HSA Asycn Copy Rect wait_event=0x0, completion_signal=0x7f984ccfd080
:4:rocvirtual.cpp :525 : 1532640728 us: 6847 : [tid:0x7f980f9eb640] Host wait on completion_signal=0x7f984ccfd080
:3:rocvirtual.hpp :62 : 1532640733 us: 6847 : [tid:0x7f980f9eb640] Host active wait for Signal = (0x7f984ccfd080) for -1 ns
:4:command.cpp :280 : 1532640979 us: 6847 : [tid:0x7f980f9eb640] queue marker to command queue: 0x7f983ccfd400
:4:command.cpp :339 : 1532640983 us: 6847 : [tid:0x7f980f9eb640] command is enqueued: 0x7f983cc4da40
:4:command.cpp :170 : 1532640987 us: 6847 : [tid:0x7f980f9eb640] Command 0x7f983ccca880 complete
:4:command.cpp :168 : 1532640991 us: 6847 : [tid:0x7f980f9eb640] Command 0x7f983cc4da40 complete (Wall: 1532640990, CPU: 0, GPU: 0 us)
:4:command.cpp :245 : 1532640995 us: 6847 : [tid:0x7f980f9eb640] waiting for event 0x7f983ccca880 to complete, current status 0
:4:command.cpp :259 : 1532640998 us: 6847 : [tid:0x7f980f9eb640] event 0x7f983ccca880 wait completed
:3:hip_memory.cpp :3057: 1532641003 us: 6847 : [tid:0x7f980f9eb640] hipDrvMemcpy2DUnaligned: Returned hipSuccess :
:3:hip_texture.cpp :1453: 1532641018 us: 6847 : [tid:0x7f980f9eb640] hipTexObjectCreate ( 0x7f9,83c,c5d,528, 0x7f9,80f,9a5,140, 0x7f9,80f,9a5,0d0, char array: )
:4:rocdevice.cpp :2034: 1532641140 us: 6847 : [tid:0x7f980f9eb640] Allocate hsa device memory 0x7f96b7400000, size 0x169000
:3:rocdevice.cpp :2073: 1532641144 us: 6847 : [tid:0x7f980f9eb640] device=0x7f9847ba2000, freeMem_ = 0xfed20000
:4:rocdevice.cpp :1899: 1532641202 us: 6847 : [tid:0x7f980f9eb640] Allocate hsa host memory 0x7f984ccec000, size 0x110
Writing: /tmp/bmw27_gpu.crash.txt
fish: Job 1, ‘AMD_LOG_LEVEL=4 ./blender’ terminated by signal SIGSEGV (Address boundary error)

And the backtrace:

Blender 3.2.0, Commit date: 2022-04-23 13:09, Hash c486da0238bd

backtrace

./blender(BLI_system_backtrace+0x20) [0xb605b00]
./blender() [0x117952a]
/usr/lib/libc.so.6(+0x42560) [0x7f9872eb7560]
/opt/rocm/hip/lib/libamdhip64.so(+0x1e8b3d) [0x7f97f83e8b3d]
/opt/rocm/hip/lib/libamdhip64.so(hipTexObjectCreate+0x8e1) [0x7f97f83f1901]
./blender(_ZN3ccl9HIPDevice9tex_allocERNS_14device_textureE+0x6b5) [0x2d4d235]
./blender(_ZN3ccl12ImageManager17device_load_imageEPNS_6DeviceEPNS_5SceneEiPNS_8ProgressE+0x392) [0x3464122]
./blender() [0x84a86e5]
./blender() [0x16759b5]
./blender() [0x1675c6b]
./blender() [0x1662987]
./blender() [0x166f6a0]
./blender() [0x16716dc]
./blender() [0x16718d9]
/usr/lib/libc.so.6(+0x8d5c2) [0x7f9872f025c2]
/usr/lib/libc.so.6(clone+0x44) [0x7f9872f87584]