Unable to have two GPUs (RX 5700 XT & RX 580) recognized by Blender simultaneously

Wonder if there is an update to this @rlamorato ?
I have the exact same issue. RX480 (polaris) or Vega64(VEGA) depending on what opencl drivers are install. Having both installed doesn’t not show both available in blender. Just one or the other. Manually apt installing the opencl packages or --opencl=legacy,pal will just priorities one card (Vega) in the end.
clinfo however only shows one card platform at a time as well. Could be a driver bug.

Nah, I wasn’t able to get this resolved. After countless tries with various workarounds found on the net, I just gave up and used my RX5700XT card alone. I’m still waiting for new drivers to come out to try again though. I also can’t pin point if it’s strictly a driver issue or a blender one. Perhaps we should report it or something?

I went over it yesterday while sorting out the opencl issue.
I eventually resorted to just extracting the opencl (dpkg-deb -x; ld.so.conf) libraries to test them. I found 18.20 opencl drivers the last version where it worked and able to have both polaris (legacy) and vega opencl (pal) Working together in cycles, I suggest you try 18.20 opencl libraries. I don’t use the amdgpu-pro drivers for anything other than opencl.

clinfo showed two platforms for 18.20 and thus detected by blender, however on 18.30 even though clinfo was correct blender only detected one or the other. This is where multi platform support breaks.

@JeroenBakker This is a genuine issue with AMD and opencl and will effect users ability to be most productive with blender. I happened to stumble across this post looking for a solution since there is literally nothing on this info that I just shared. There does appear to be an issue with multiple opencl platforms with AMD. Note my comments on 18.20 and 18.30 in relation to blender (clinfo detects but blender sees one) as a place to start.

All the best.

1 Like

I tried your method and the 18.20 version does recognize the 2 cards however, my RX 5700 XT is listed as unknown.
Screenshot from 2019-12-30 17-37-59
Trying to render results in a segfault with these printed on the terminal:

OpenCL: Error creating command queue
OpenCL error: CL_INVALID_COMMAND_QUEUE in clEnqueueWriteBuffer(cqCommandQueue, CL_MEM_PTR(mem.device_pointer), CL_TRUE, 0, mem.memory_size(), zero, 0, NULL, NULL) (/home/workstation/.local/src/blender/master/intern/cycles/device/opencl/opencl_split.cpp:1053)
OpenCL: failed to initialize device.
Writing: /home/workstation/.cache/blender/blender.crash.txt
Segmentation fault (core dumped)
1 Like

18.20 is way before NAVI. You could attempt to tinker with other versions, however I suspect the code is broken from 18.30 onwards.
The issue is the drivers are broken for AMD multi-platform OpenCL that’s really all there is too it. Pulling out the pal drivers to keep with a legacy support I found was part of the issue. For example I tried including both 18.20 legacy drivers with 19.30 pal drivers but there is just something overriding any legacy drivers, they are separate platforms after all, so something is seriously broken with these drivers.
Consider it just buggy OpenCL drivers. A shame I don’t know what to else suggest. These are closed drivers after all. But at least you confirmed the issues with linux in regards to AMD opencl platforms working together. I’ll be stuck on 18.20 for some time I think.

Verdict: Broken.

1 Like

Indeed.

I too tried that route, but it seems after replacing 18.20’s libOpenCL.so with 19.30’s so that my RX5700XT could be recognized, the other card disappears from blender.

I’d love for this to be reported to AMD as well as something’s clearly broken. Or at least blender devs could take a closer look at this.

Just an update. Extensions function suffix must be unique to be identified by the opencl loader. Both platforms now show up with a simple edit.

Number of platforms                               2
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2906.7)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2906.7)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             XXX

Both platforms now show up with a binary patch on the 19.30 legacy library. This makes a simple change to the Platform Extensions function suffix (just a space string) to fool the OpenCL loader.
To apply:
patch < libamdocl-orca64-SUFFIX-FIX.patch.xml libamdocl-orca64.so

libamdocl-orca64-SUFFIX-FIX.patch.xml (6.9 MB)

This is now up to AMD to address officially in their drivers. It could well be an issue with OpenCL 2.x (but beyond my interest now).
Feel free to download it and see if your NAVI is 1) Recognized with clinfo and 2) Available within Blender

I did a successful bmw27 render with my VEGA + RX480 with latest 19.30 OpenCL drivers/libs with this patch applied as above and it is faster by ~10secs than the 18.20 version (so that is that).

Number of platforms                               2
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2906.7)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2906.7)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

You can now see 2 platforms (legacy and pal). The duplicate platforms that the loader will reject usually.
All the best.

1 Like

You da man! Great job tracking down the root cause, I finally have the 2 working. Do you mind telling me how you did the patch? In case this doesn’t get resolved in the next driver release. Thank you.

1 Like

It’s a simple fix but just had to lookup what OpenCL expects to be unique. Then tracked down in the binary where the suffix is looked up. Lets hope a real workaround or update to the actual drivers is done (or even possible) in the future.
ExtensionSuffiicChange
Note the space after AMD. I guess it assumes white space stripping when it’s actually used :wink:. (you cannot change the suffix to anything other than AMD. Spent ages trying that.)

Amazing, I never would’ve thought to look where you did. I for one hope AMD releases an official fix for this as it seems that’s a little bit above my technical know-how. :slight_smile: Thank you so much for digging in to this.

P.S. Happy New Year!

1 Like

Still broken in amdgpu-pro-19.50-967956

Thanks @joules for your work! I managed ubunto to detect both my rx5500xt and rx590 although my rx5500xt is not very well supported under 19.30… is there any way you could implement the patch for 19.50 driver? I tried to apply the patch to 19.50 ut I got Hunk #1 FAILED at 61521.
1 out of 1 hunk FAILED – saving rejects to file libamdocl-orca64.so.rej

Thanks!

Note that while the patch makes the two cards visible in blender, I’ve never had any luck getting the two to render. One tile will be stuck forever and/or just hang the blender if not the whole system, requiring a hard reset.

@rlamorato did you manage to patch it under 19.50?

Unfortunately no. I tried looking at the thing using a hex editor, but I didn’t know what I was looking for exactly so I stopped kidding myself. XD

@joules would you be able to explain us a little more on how to do a patch for 19.50?
Thanks a lot!

@rlamorato I thought it was working for you? This patch only applies against 19.30.
Have to check. Tomorrow. Can you run clinfo and paste it?

It did work somehow… I was able to render using the 2 GPU’s with the default scene but anything more complex the render will: hang on the first two tiles or all the tiles will render except for one or render doesn’t even start. I haven’t done any extensive tests to know what causes the problems exactly. All I know is that clinfo detects the two cards, but rendering with both is a little problematic.

This is on 19.30 with the patch you supplied.

clinfo.xml (17.6 KB)

Well I have good news and bad news.

@rlamorato Not really sure. Please try latest 19.50 and clinfo however see bellow for how I set up opencl.

The good news is I no longer need this hack.

The bad news is I’m having trouble tracking down why.

I went about d/l 19.50 and installing it. Straight away clinfo showed 2 platforms and both cards. Blender recognized both. Went back to 19.30 same thing.
But I’ve done alot of things in the last few months:

  • Swapped pcie slots between cards.
  • Change primary display pci slot in the bios.
  • Mess about bios alot
  • Installed windows 10
  • flashed vbios on my vega to latest (on their forum) - But reverted back to previous vbios and still had no issues.
  • Plus a few more things (kxstudio, low latency kernel)

I’ve spent a couple of hours, without pulling my cards to figure it out. And no such luck. It could be any of these, or none.
We could have been very fringe cases here and why this hack worked is now a mystery I’m afraid.
As for the latest navi cards. I don’t have one to test.
My setup for OpenCL and amdgpu:

  • Linux joules-X8DAH 5.3.0-24-lowlatency #26~18.04.2-Ubuntu SMP PREEMPT Tue Nov 26 14:39:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Repo: deb http://ppa.launchpad.net/oibaf/graphics-drivers/ubuntu bionic main
  • Manually extract the opencl-*.deb using dpkg -x and place them in a /opt/amdgpu-pro-19.30-opencl/lib along with an entry for ld.so.conf.d then ldconfig
  • Make sure there’s proper entries for the libs in /etc/OpenCL/vendors
  • I use the OPENCL loader (libOpenCL.so.1.0.0) that comes with the OS not the one supplied with the pro drivers that would be in package ocl-icd-libopencl1
  • I now recommend having a copy of windows 10 - to at least compare issues between OS’s

Not sure what else to suggest, other than I’ll be back here if I come across any issues in relation to this.
Good luck.