Cycles AMD HIP device feedback

lcas · May 1, 2023, 9:36am

Thanks, 5.0 link is from the wiki. Still no windows compile, right? Like no unofficial workaround I can just go somehwere like this and poof, it works:
https://radeon-pro.github.io/RadeonProRenderDocs/en/hiprt/about.html

I don’t think that is the same thing, just related to it somehow with raytracing stuff that is being worked on? Windows compile might be happening sometime this year I hope:

Alaska · May 1, 2023, 9:58am

I believe an AMD engineer said that HIP/ROCm for Windows would release in the next few months.
And yes HIP-RT is a library to extend HIP with some ray tracing specific features. And Cycles has support for it. I believe it’s just disabled at the moment since no public facing AMD GPU driver supports it properly. At least that was my understanding from roughly a week ago.

silex · May 1, 2023, 4:14pm

I’ve expanded on performance comparison between Windows And Linux posted by @TurtleDev and @L_S. With 3.5 out and more cards tested it still looks that Linux has worse performance by solid 10% and in some cases almost 18%.

If we exclude outliers like Vega/VII and W6800 which got first regressed by 300 percent then it came back, then by average Linux regressed by 4.7%, Windows got better by 14.2% and Linux is worse than Windows by average of 8%.
If we focus on RDNA2 only the numbers are: 5.2%, 19.1%, 12.9%.
RDNA3 looks much better balanced OS-wise, but this is just one SKU.

Blender v. perf. Δ = ((newest score per OS * 100) / oldest score per OS) - 100
lead/loss to Windows = ((most recent Linux score * 100) / most recent Windows score) - 100
The numbers are from all Blender from 3.0 to 3.5.

GPU Model	OS	Blender v. perf. Δ	lead/loss to Windows
5600X	Linux	-8.8%	+17.3%
5600X	Windows	+4.25%	-
Vega	Linux	+42.7%	+7.5%
Vega	Windows	+35.0%	-
6600	Linux	-4.8%	-9.9%
6600	Windows	+3.4%	-
W6600	Linux	-	-15.84%
W6600	Windows	+22.6%	-
5700XT	Linux	-4.4%	-9.2%
5700XT	Windows	+10%	-
6600XT	Linux	-6.7%	-11.3%
6600XT	Windows	+21.4%	-
VII	Linux	-9.9%	-8.6%
VII	Windows	+79%	-
6700XT	Linux	-6.6%	-11.8%
6700XT	Windows	+13.4%	-
W6800	Linux	+314%	-8.3%
W6800	Windows	+24.5%	-
6750XT	Linux	-9.5%	-14.5%
6750XT	Windows	+21.7%	-
6800	Linux	-7.3%	-9.4%
6800	Windows	+27.7%	-
6800XT	Linux	-6.7%	-17.7%
6800XT	Windows	+21.4%	-
6900XT	Linux	+2.5%	-12.5%
6900XT	Windows	+19.7%	-
6950XT	Linux	-7.9%	-13.4%
6950XT	Windows	+21.1%	-
7900XT	Linux	-	+2.7%
7900XT	Windows	-2.7%	-
7900XTX	Linux	-0.9%	+1.1%
7900XTX	Windows	+0.9%	-

2905710881 · May 2, 2023, 10:16am

rocm on windows just announced the launch of the official (including last year’s hip experiment is more than a year), why windows is better than the optimization of years of linux? cuda opencl are linux speed

Nubnubbud · May 3, 2023, 1:07pm

hey, a new page just loaded on AMD’s site. There’s nowhere I can find on the site to access or seasrch it, but it’s pretty easy to reveal via targeted google search. There’s a direct download link for the drivers that will supposedly add blender support in windows.

It’s not version 23 like the main one on the site, so it might be a branch of the previous version, but is this the driver we’ve been waiting for to enable public testing of HIP RT? there is exactly one (1) “highlight” for this version, according to AMD, and it’s

“Support for HIP RT in Blender™ 3.6 Beta”

and my next question is, when should we expect to see OpenData reflecting the availability of HIP RT? I’ve stressed it before, but we do need that relatively quickly, so we can begin to gauge the effectiveness of the new render tech, and sorta hopefully give a certain team green some competition in the professional space, for the benefit of all our wallets on every team.

ThomasDinges · May 3, 2023, 2:05pm

OpenData won’t be updated before the final 3.6 release end of June.

Nubnubbud · May 3, 2023, 2:57pm

I see. well, I guess if someone really wanted to know, they’d have one of the newer cards and use a headless render to calculate it on a benchmark file.
eager to see if this is the real driver, though.

Alaska · May 4, 2023, 12:08am

I believe some compiler bugs were found that might need to be fixed before HIP RT is enabled for the public. It may actually already be fixed by now, but I’m not 100% sure on that.

Mikolaj_Neronowicz · May 4, 2023, 8:46am

From my perspective, as a person who does not program, the situation looks very bad. Especially in relation to Linux systems. I am disgusted by how slowly AMD is addressing this issue. I am saddened, as a customer who bought their hardware, that the company can’t easily solve the driver issue for my system. HIP in Blender works but has problems and I have to keep an eye out for running viewport with overlay enabled. In the latest alpha 3.6 version hip detects my card but enabling hip rt is not possible. Also, hip itself does not work and does not generate an image. This is some kind of unfunny comedy.

2905710881 · May 4, 2023, 2:37pm

Why can’t hiprt generate images?

galivasya · May 4, 2023, 4:44pm

Looks like beta driver is broken

ThomasDinges · May 4, 2023, 6:50pm

From the Cycles meeting on Tuesday:

AMD HIP-RT code was merged, but is not yet enabled due to issues found in testing. Brian will send an updated HIP-RT SDK, and mention the right drivers to use for testing the the pull request. Brecht will then make a new build and test.

AMD ROCm 5.5 was released, which should enable us to re-enable HIP on Linux by upgrading to this compiler version. Brecht will test it. This driver release should also fix viewport crashes with RDNA2 graphics card. It may take a bit for Linux distributions to upgrade to this version.

So please have some patience and wait until this is officialy enabled and ready for testing.

2905710881 · May 4, 2023, 10:01pm

The latest download of 105388 is now available, and the speed improvement is The minimum speed increase of only 10%, the most speed increase of 50%, Figure more will not be sent

Alaska · May 5, 2023, 5:12am

These speed increases are in line with what AMD expects of the current implementation, and what they observed in testing.

See this chart from #105538 - Cycles: HIP-RT for AMD hardware ray-tracing - blender - Blender Projects

Total render time is between 7.14% slower to 32.14% faster in their testing.
Pure rendering speed is between 3.10% faster to 44.64% faster.

Mikolaj_Neronowicz · May 5, 2023, 6:59am

It says that the benchmark test results (sample/second) and total render time are measured on the W6800. That is, they used the Radeon™ Pro W6800.

Alaska · May 5, 2023, 7:00am

Sorry, I missed that. Thanks for pointing it out.

Mikolaj_Neronowicz · May 5, 2023, 7:04am

Ok. Actually I was a bit harsh and unkind. keeping my fingers crossed for AMD and Blender! It’s nice to break the monopoly of one manufacturer. I think good support for Blender will be a game changer for Radeons

Nubnubbud · May 8, 2023, 8:11pm

it’s pretty obvious that the implementation will need some hefty optimizations to match nvidia optix, if it’s even possible with the hardware implementation. The acceleration is as follows:

OPTIX
monster: 52%
junkshop: 62%
classroom: 65%

HIPRT
monster: 17%
junkshop: 27%
classroom: 22%

it’s a very small dataset, but something of note for now, is that something about junkshop is relatively easier for HIPRT to render than OPTIX at first glance… there are a couple SSS materials, but honestly to make a more detailed breakdown I’d need to make a material and feature test suite to identify strengths and weaknesses between drivers.

Alaska · May 9, 2023, 12:36am

Cycles has a feature where it can breakout where processing time is being spent.

This can be done by opening Blender from the command line with the launch arguments:
--debug-cycles --verbose 4

For example:

/path/to/blender --debug-cycles --verbose 4

If you then render an image, at the end you will get some stats printed in the terminal about the render and where processing was spent. These stats look like this:

I0509 12:22:47.046001 34484 session.cpp:464] Total render time: 6.07198
I0509 12:22:47.046001 34484 session.cpp:465] Render time (without synchronization): 5.43004
I0509 12:22:47.095000 34484 queue.cpp:39] GPU queue stats:
I0509 12:22:47.095000 34484 queue.cpp:43]      1.03760s: integrator_shade_surface integrator_sorted_paths_array prefix_sum
I0509 12:22:47.095000 34484 queue.cpp:43]      0.96824s: integrator_intersect_closest
I0509 12:22:47.095000 34484 queue.cpp:43]      0.66656s: integrator_shade_background integrator_queued_paths_array
I0509 12:22:47.095000 34484 queue.cpp:43]      0.65319s: integrator_shade_background
I0509 12:22:47.095000 34484 queue.cpp:43]      0.44145s: integrator_init_from_camera
I0509 12:22:47.095000 34484 queue.cpp:43]      0.39213s: integrator_intersect_closest integrator_queued_paths_array
I0509 12:22:47.095000 34484 queue.cpp:43]      0.30637s: integrator_init_from_camera integrator_compact_states
I0509 12:22:47.095000 34484 queue.cpp:43]      0.28085s: integrator_intersect_shadow
I0509 12:22:47.095000 34484 queue.cpp:43]      0.20484s: integrator_terminated_paths_array integrator_compact_paths_array
I0509 12:22:47.095000 34484 queue.cpp:43]      0.11728s: film_convert_combined_half_rgba
I0509 12:22:47.095000 34484 queue.cpp:43]      0.11277s: integrator_shade_light integrator_queued_paths_array
I0509 12:22:47.095000 34484 queue.cpp:43]      0.09890s: integrator_shade_shadow
I0509 12:22:47.096000 34484 queue.cpp:43]      0.05114s: integrator_shade_shadow integrator_queued_shadow_paths_array
I0509 12:22:47.096000 34484 queue.cpp:43]      0.01223s: integrator_terminated_shadow_paths_array integrator_compact_shadow_paths_array
I0509 12:22:47.096000 34484 queue.cpp:43]      0.01203s: integrator_init_from_camera integrator_reset
I0509 12:22:47.096000 34484 queue.cpp:43]      0.00167s: integrator_shade_surface integrator_sorted_paths_array integrator_compact_shadow_states prefix_sum
I0509 12:22:47.096000 34484 queue.cpp:43]      0.00119s: integrator_intersect_shadow integrator_queued_shadow_paths_array

Based on these stats, you can see that integrator_shade_surface is taking up the largest amount of processing time in this scene.

Note: These states are sorted in order of most processing time to least processing time.

What you can do with this information is you can compare HIP to HIP-RT and see how they differ (most of the difference will be with intersection kernels). With this information you can actually figure out the “speed boost” the RT accelerators offer over just doing ray tracing on the general purpose compute cores.

Along with that you can compare CUDA to OptiX and see how they differ (once again, most of the difference will be with intersection kernels).

And lastly you can compare the relative distribution of work between various parts of the render between AMD and Nvidia with various architectures. (E.G. 20% of work is spent in X feature on Y GPU, while Z GPU only spent 10%) Allowing you to pick out things and come to conclusions like “GPU X from Manufacture A seems to have a hard time rendering sub-surface scattering materials when compared to GPU Y from Manufacture B in THIS SPECIFIC SCENE”

AMD and Nvidia have different GPU architectures, cache layouts, memory speeds, ray accelerator designs, BVH layouts when using OptiX and HIP-RT, and more. There’s a lot that’s different between them. They have their own strengths and weaknesses, and it may not be as simple as “X material always renders faster on AMD”. It may be more complex and situational.

It may even be hard for AMD to get close to showing the same benefits with HIP-RT as Nvidia shows with OptiX with current generation hardware due to major hardware differences. But this is just speculation.

2905710881 · May 9, 2023, 1:52pm

I understand that the 6000 series does not have hardware bvh and I have been waiting for the 7000 series to test it.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Cycles AMD HIP device feedback