My winter goal is to implement AMD Orochi in Cycles

Just like it says on the tin there. I was getting frustrated with having to use workarounds and jury-rigging to get anywhere with HIP on Windows when the Googler pointed towards a comment from @bsavery about Orochi, I got nothing but time until next fall when college starts for my interactive media development program, got a stack of surgeries to chew through so I’m homebound regardless, guidance on good practices is always welcome.

I hope to replace all GPU backend in Cycles with Orochi which will hopefully mean that the necessary code for compute is reduced to 1/4 of it’s previous self to allow for (hopefully) much simpler development provided hardware vendors don’t go offroad with their development. oneAPI and Metal are not as of yet implemented in Orochi so that’s an obstacle on top of an obstacle. It would be nice to use Orochi to get OpenCL going again for those with legacy hardware, if possible.

My hope is to really push for hardware agnosticism and simplicity (somewhat). I think it would make an interesting project if nothing else and technically the code to do this is already in Blender, (sort of) because if you’ve read the Orochi documentation it almost seems like you could just run a script to find every instance of “hip” and change it to “oro” and 90% of dev is taken care of.

From the github repo

HIP

#include <hip/hip_runtime.h>

hipInit( 0 );
hipDevice device;
hipDeviceGet( &device, 0 );
hipCtx ctx;
hipCtxCreate( &ctx, 0, device );

After pushing it through the oro-fice it becomes,

Orochi CUDA/HIP

#include <Orochi/Orochi.h>

oroInitialize( ORO_API_HIP, 0 );
oroInit( 0 );
oroDevice device;
oroDeviceGet( &device, 0 );
oroCtx ctx;
oroCtxCreate( &ctx, 0, device );

Which as they say on the repo has made that code capable of being used on red or green flavored GPUs. Not all cases are going to be that simple but in essence this beautiful layer barely changed pre-existing HIP targeted code, then traded one small addition for the entirety of what would be needed for CUDA, if/with oneAPI, Metal and BrokenCL that’s a bananas amount of code not to have to worry about anymore. This also has the added benefit of not having to care about the HIP SDK not being around for Windows users and the lack of straightforward means to compile HIP on Windows. Unless one decided to just appear out of thin air recently.

If anyone has been playing with Orochi and has some pearls to drop I’m all for it,

4 Likes

That’s noble. Hopefully it frees up devtime in the future. Is there no performance impact?

Very good.

Orochi would only help deduplicate host side CUDA and HIP code. It’s not going to help with OptiX, OneAPI, Metal, OpenCL, or any kernel side code. For that a very different API would be needed.

I’m not sure yet this is actually something we want to use in Cycles, the benefit looks relatively minor.

This also has no impact on the HIP SDK requirement. We need that for compiling kernels, the host side code already compiles without it.

1 Like

This also has no impact on the HIP SDK requirement. We need that for compiling kernels, the host side code already compiles without it.

Orochi doesn’t require the HIP SDK for anything it does. It uses HIP-RoCCLR then gets what it needs with comgr.dll, amdhip64.dll, hipew and hiprtc,

“This library doesn’t require you to link to CUDA (for the driver APIs) nor HIP (for both driver and runtime APIs) at build-time. This provides the benefit that you don’t need to install HIP SDK on your machine or CUDA SDK in case you’re not using the runtime APIs. To run an application compiled with Orochi, you need to install a driver of your choice with the corresponding .dll/.so files based on the GPU(s) available. Orochi will automatically link with the corresponding shared library at runtime.”

It does what it do. That’s why I got excited, I’m sure there’s major considerations that will hit me right square in the melon but for a passion project I can deal with that.

Microsoft has their Antares reacharound-workaround for whipping up backends targeting
c-rocm_win64 c-cuda_win64 etc, it’s a gas. No SDK? No problem, uses Windows Subsystem for Linux to do the dirty work.

ab_utils::Process({“wsl.exe”, “sh”, “-cx”, “”/opt/rocm/bin/hipcc " + wsl_path + " --amdgpu-

What’s true for Orochi is also true for cuew and hipew which we are currently using. It bundles those two and provides an abstraction over them which can be convenient, but doesn’t add any new capabilities.

Nothing about orochi uses OpenCL though, not sure how you’re proposing using Orochi would enable any hardware HIP doesn’t support? Unless you’re going to write an OpenCL backend to Orochi?

But the thought of eliminating redundant Cycles backend code DOES have merit and of course would need input from Blender devs to approve. At BCon, the intel guys were demonstrating running all flavors of linux GPUs through the OneAPI backend…

1 Like

cuew and hipew are two separate entities and there’s still the need to have both cuda and hip to have them dance together whereas Orochi can remove the need for the double, which I know isn’t the ground breaking revelation I’m making it out to be, the reason I’m excited for it is that it’s open to having Optix/Metal/oneAPI.

Shamelessly stolen quotation.

“Orochi is designed to add more backends if needed. Adding intel support makes sense to complete the project. We are happy to work with anyone who’s interested in adding it although we don’t have any plan to add it at the moment.”

Which is sort of what Intel can somewhat do with oneAPI at the moment, hence oneAPI, I started down the road for wanting to this with oneAPI
https://intel.github.io/llvm-docs/GetStartedGuide.html#known-issues-and-limitations

I mean I’m not trying to steer nobodies ship it’s more of a personal passion project as a hardware locked out individual due to AMDs approach to legacy hardware being the development equivalent of taking Old Yeller out behind a barn to play fetch with buckshot. I can imagine it’s not as performance oriented as a straightpipe from A to B given the dynamic nature but I sure like the idea presented of taking all back ends and (for the most part) making it one single plate of spaghetti to maintain. When those backends are (if) ever present. Things nearly went sideways with this Orochi business when AMD decided hiprtc wasn’t cool enough to eat at the lunch table anymore.

In the end though there’s 100 different silly projects I could make a mess out of, like seeing how much of Blender can be made to run in a browser rendering with webgpu

1 Like

That would be the if possible part, I’m in a modular mindset after dorking around with xformers for awhile. All missing back ends have to be made for Orochi though. Like Mr. Harada was saying oneAPI makes sense to be part of a complete project. The whole 8 headed dragon thing really has me on a kick for a one path pipe that that is developed as just that. Modular down the tube.

Yeah that comment you made about the Bcon thing is why I started down the oneAPI road but then turned around when I looked into this Orochi thing you mentioned. Which then I see on the same day as this GitHub - microsoft/antares: Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends. which I know is limited but damn it sure makes everything look non-chalant, “Oh you need a DX12 shader? for that platform? ok, here set this backend, ok here you go”

I also kind of like the idea that Orochi can be slipped on like a sock onto anything that has HIP hanging off of it and enable all that it can (or can one day) do because of how the -hip +oro thing works, not that it doesn’t require more effort than just replacing three letters but you get what I mean. It’s like having one wall socket and a kitchen full of waffle irons with all sorts of foreign plugs, Just plug them all into Orochi, ok I need to sleep.