Cycles AMD HIP device feedback

Maybe the OpenCL backend was used to run on CPU? I don’t remember exactly.
I was referring to this slide: oneAPI backend: Cycles on Intel GPUs - YouTube

edit: now that I look at it again, I see that was used to run on CPU and older intel GPU’s indeed. Still older AMD cards do support OpenCL, so it might be interesting for people having that hardware…

But this getting offtopic, so I’ll shut up about it :wink:

1 Like

Before I go down the rabbit hole here of trying to build around my graphics card is there a hard insurmountable obstacle I’m not aware of? GFX702 so… Rocm support died officially in 2.0 I believe but IIRC Hawaii GPUs still kind of worked until 3.5 with patches, I need to re-read whatever my brain kept about CUDA warp being 32 and AMD wavefront being 64 and the workaround for that I think. warpSize or something.

So make a brew with a version of llvm, hip clang, cuda and shazbot that magically works, then build Blender and Cycles from scratch re-enabling the GFX7xx’s. If I can’t find whatever magical sauce people used to get Hawaiis working in 3.5.0 and can’t use hipify tools and such then that means I need to use the deprecated HCC, yeah? This will be fun for a highschool dropout. I didn’t see anything in the llvm docs in the target triples there that specifically said this won’t work. Prorender works like a slap of hot damn these days but I feel like I need to back my 390 like it’s a kid being bullied out of a little league baseball game.

Might be easier to break into AMD headquarters and thief me up a RDNA card holding up a wobbly table somewhere.

If I pull this off do you think AMD would sponsor my hardware for the next three years of my interactive media design diploma out of pity? Perhaps disgust? Let this abomination begin however it turns out because of AMDs documentation standards,

From the AMD GPU GCN3 ISA blurb.
“AMD GCN3 ISA Architecture document describes the environment, organization, and program state of AMD GCN Generation 3 devices which includes Radeon R9 family of devices. It details the instruction set and the microcode formats native to this family of processors that are accessible to programmers and compilers.”

That’s not confusing at all, what with half of R9 being GCN2. Cycles requires a minimum of Cuda computer 3.0 so… as long as I meet CUDA 3.0 in HIP code it should work, sort of, right?

Wish me luck.

2 Likes

Quick Feedback:

  • Not having access to the HIP libraries to compile with them openly is… making it really hard to troubleshoot or make custom builds with it.

When will these libraries be placed in the lib dependencies? Is there another open access point to these libs?

1 Like

Edit: Dumb.

If there’s no hip compiler for Windows afaik can we use hiprtc? I was dicking around with prorender and realized I got amd_comgr.dll which is all hiprtc needs independent of actually having HIP SDK?

From github on hiprtc.dll
" * This library can be used on systems without HIP install nor AMD GPU driver installed at all (offline compilation). Therefore it does not depend on any HIP runtime library"
" * But it does depend on COMGr. We may try to statically link COMGr into hipRTC to avoid any ambiguity."

So can a person on Windows use these tools to get from point A to B? Using hiprtc to make/try to make a fatbin for their unsupported architecture then try to walk backwards to make it compile and run properly?

1 Like

First of all, congratulations on the 3.4 6800xt speed up to 30s (actually overclocked to 2560) In addition I found a friend who reviewed the 7900xt from where he got the 7900xt rendering speed can only do 25s to complete, 84/72= 1.16, double the fp32 performance is not used?:smiling_face_with_tear:
Of course I don’t have any derogatory meaning, just this enhancement is difficult for me to have the idea of upgrading, 6800xt is also just and 3080 early because of the bitcoin trend can not buy gpu so 2 side snapping good to buy, from the current situation 7900xt play games is very good of course I am not sure of the specific performance, my friend also signed a non-disclosure agreement, cycles outside the details are not possible and I said.

Blender could only support and optimize released hardwares. I suppose the 7900 series will get a boost after AMD releasing the corresponding drivers. And when the hardware raytracing being enabled in the future, there will be another huge boost I guess.

1 Like

Since the 7000 series rop to 192 192/128 = 1.5, then double the fp32 can at least increase the actual computing speed of 50 percent to make sense, if it is 1.5x rdna2 execution efficiency that should also be 30/((84/72)*1.5) = 17s

Maybe it is restricted by the driver? Since old driver would never know how much power future hardware has.

1 Like

Just caught a 7900 XT benchmark on opendata, I don’t know what to think about that. I know it’s only one bench but 1/3rd of the 4080 performance? I am trying to wrap my head around this, less performance than a 3070?
So… with the XTX having a slight bump in hardware spec it’s still going to be outpaced by leagues? Even if it hits a 4,000 it’s still under 1/2 the performance of a card nobody seems to care for, the 4080 because it’s bad price to performance ratio.

I’d better move my attention from the equipment to the technology, and when the technology arrives to focus on better equipment. :sob: :sob: :sob: :sob:

1 Like

As far as I’m aware of, HIP performance in its current state is to be compared to NVidia-CUDA, since it is not RT-accelerated. Then, from the only benchmark available the RX7900XT seems about on par with the RTX3090.

Possibly the dual-issue SIMD units need driver work or software enablement to be actually used?

1 Like

No, don’t do that. That’s a really terrible and completely awful way to try and bridge a performance gap. You’re taking away way more than just RT from the lime flavoured GPU and if it has to lose core functionality so the AMD counterpart can be on par then that’s regression for the entire industry. If Michael Phelps swims faster than you because he’s got those genetic webbed toes you can’t chop his feet off before a swim just to even things out.

Doing what you are doing is working on the assumption that the strawberry team will pull a rabbit out of their hat that equals a mature API that’s bordering on 12 years of development.
I’m sorry but I need to keep going back to the fact that you feel that crippling one card to try and achieve some sort of performance parity makes any sense. Are ARC cards using RT? No they are not.

So why is a $289 video card, the A750 hanging out in between $2k+ workstation cards
Intels first real foray into dgpu territory and they stand up and surpass 90% of the hardware from a company that’s been at this for longer than I can remember? It’s not raytracing right now.

I am going to get my hand smacked but performance parity was sold as a “Team green being greedy with the CUDA” narrative for a very long time. It was the greenie meanies that made life miserable, that doesn’t really hold any water beyond the fact that development wise there is disparity, I will give the strawberry lads that.

Look, I am trying to be supportive because nobody wants one dominant flavour in the mix, things will become stagnate in a hurry. HIP Cycles existing is why the older dorks like myself can’t use raytracing and why the newer cards can’t either. OpenCL raytracing works just fine in Luxcore, not that I blame Cycles devs for throwing it in the trash but 3.0 is allegedly retconning the 2nd act of the film, Prorender looks great these days and throws down with ML and raytracing, With Cycles? Nobody gets any pie because of HIP. It shoulda been left in the waffle iron a little longer.

The most frustrating part of it all is those two raytracing platforms I just chooched about?
Blender headhunted the main dev of Luxcore, Blender has AMD devs, or maybe it’s just Brian left but either way, he devs the heck out of Prorender. So I can imagine Luxcore fella and Brian work on Cycles a lot, Luxcore development has stopped for the most part and I assume that Prorender could be even more amazeballs (sorta) if time didn’t have to get siphoned off for non-Prorender things…

So, Cycles HIP existing.
Ate OpenCL for aged AMD devices
Doesn’t provide RT for modern AMD devices.
Kneecapped dev for an engine that did use OpenCL and did RT
Slows down development on a HIP engine that does use RT

I still believe in the red barons which is why I am dicking around with Orochi in the hopes to do something cool but going back to my card being from 200 AD getting anything to work on it over in Archland is a wacky-hacky adventure. Not segfaulting is progress.

3 Likes

There is no Luxcore developer working on Cycles, not sure where that idea came from.

And for AMD, just like other companies they have developers working on their own renderers, working on compilers and drivers, working together with developers of various applications. Your description makes it seems as if it’s a one man operation or something.

Cycles in various cases is being used by hardware vendors to test new APIs and compilers which can then stabilize and get used by other renderers.

3 Likes

Oh my mistake, language barrier. The person in question left a project that was for Blender. So the statement sounded like he left the project for Blender. Left for greener pastures phrasing.

Oh? You got more than one AMD developer? Right on, bsavery is usually the name/speaker I see the most. RT coming soon then?

That’s good, for cycles in that it’s involved in various cases being used by hardware vendors to test new APIS and compilers which can then stabilize and get used by other renderers. I’m happy for cycles for being such a renaissance renderer.
Do you think after those hardware vendors are done testing new APIS and compilers which can then stabilize and get used by other renders that it will be less of a Paradigm Shifting Synergistic Agile Application for Out of the Box Value Added Alignment Leveraging Platform™ and more of a user focused open source tool to foster creative individuals without locking them out due to their socioeconomic status?

FOSS is boss but if you want honest feedback on Cycles AMD HIP, it’s not. HIP created an economic barrier for users that discriminates against those who cannot meet the requirement set out by AMD for HIP after removing their alternative. The lesser option which would have been the OpenCL implementation was then axed. It’s like kicking a person out into the cold, but then knocking them down and stealing their shoes just for good measure.

I know I’m singing the same tune to the same deaf ears, so I’ll try not to take further time away from your fostering of your product and any networking with studios and such you do and I’ll go back to banging stones together in the mud hoping to appease clang,

1 Like

We don’t have N dedicated developers from hardware vendors like that. These developers typically support multiple applications. So they might spend some weeks helping with Cycles integration of a feature, or fix a bug or make an improvement in the compiler or driver, and then move onto something else.

See the last meeting notes for updates on HIP-RT, an initial implementation is being reviewed by me. And the goal is of course for that to be an API other renderers use as well.

We decided to drop OpenCL because it was holding back Cycles development, driver issues and other limitations were making it very hard to add new features. If we were still using OpenCL we’d struggle to support these older architectures as well when there is no official AMD support for them anymore.

Already now when adding for example many lights sampling I’ve had to spend time doing workarounds. But at least with HIP we can ensure that the one compiler version we use works, rather than dealing with a range of OpenCL driver versions and implementations that each have their own bugs and limitations.

4 Likes

The Blender Benchmark is Blender 3.3 based, it needs to be updated to Blender 3.4 to support 7900XT. So that “benchmark result” is using CPU.

@brecht when can this be updated?

I think it was updated? It’s showing results from 3.4.0:

Ah ok great. Thanks…

(it wasn’t in the update log on the page)

Ah yes, I think those are updates for the benchmark launcher, which are usually not needed for new Blender releases.

No, it’s the GPU. I double checked versions.

Unless those NVIDIA GeForce RTX 4090 users have some real firepower processors, in the 3.4 grouping.

I had to double check, then triple check because there was no way that made any sense but it does, sort of. Compared to a 6900 XT it’s a great gain.
For 3.4
AMD Radeon RX 6600 XT 1183 @ 5 bench
AMD Radeon RX 6700 XT 1646 @ 3 *
AMD Radeon RX 6750 XT 1615.@ 1 *
AMD Radeon RX 6900 XT 2142.@ 1 *
AMD Radeon RX 7900 XT 3461 @ 1 *

Assmathing it on a napkin, that score makes sense. 7900 doubles the transistor count and the compute follows. The score, roughly falls within that. In terms of other hardware it’s a bump in shading units, tmus, rops, rt cores, but just a bump… IE 5376 shading units to the 6900 5120 count. 4 extra c/u,
So my disbelief aside, it not being a CPU aside, that score does make sense and given where it’s predecessors are I don’t think arguments that drivers or anything are holding it back. Always room for more but it’s right in line with what’s been on the table already.

It probably games like a son of a gun but that doesn’t help in this arena. It’s sitting beside a 3070, which can be bought new for $500~

It’s more powerful than the 3070 in a world where it’s doing the raytracing and all the other fancy algorithm-wrapped-in-a-flour-tortilla stuff that is OptiX, but it ain’t.

Prorender does baller stuff with ML, the denoise float16 is your money melon right there. 5x the jambalaya in a 7900 XT, and yeah this stuff is halfarse accurate but close enough.

7900 XT - FP16 (half) performance 103.0 TFLOPS (2:1)
3070 RTX -FP16 (half) performance 20.31 TFLOPS (1:1)

denoise_c3_ldr.pb
denoise_c3_ldr_float16.onnx
denoise_c9_ldr.pb
denoise_c9_ldr_f16.onnx
srgan-03x2x32-273866.pb
esrgan-05x3x32-278391.pb
taau_low_res.pb
taa_upscale_2x_part.pb
upscale2x_c3_rt_f16.onnx
upscale2x_fast.pb

It goes vroom vroom because of this sort of NOTPtiX method.