State_buffer_size returning 0 on GCN1 cards with amdgpu-pro

So I tried to investigate why exactly Cycles does not work on GCN1 cards.

With the latest drivers I can get Blender to compile the kernels successfully, it does however require setting the GPU_FORCE_64BIT_PTR environment variable to 1 - without this the amdgpu-pro OpenCL driver does not return any memory info.

Upon loading all kernels Blender crashes with a divide-by-zero in DeviceSplitKernel::max_elements_for_max_buffer_size(), since the call to state_buffer_size(kg, data, 1024) returns 0.

state_buffer_size() loads the kernel_state_buffer_size.cl kernel onto the device and then returns the size as set by the kernel (*size = split_data_buffer_size((KernelGlobals*)kg, num_threads);).

Does anyone have any ideas why it would return 0 under GCN1?

I tried changing the returned value to 2048 (*size = 2048;) just to check - and to my surprise

size_buffer.copy_from_device(0, 1, 1);
size_t size = size_buffer[0];

which is supposed to get the size back from the device - still returns 0;

Could this be a bug with copy_from_device/device_copy_from/OpenCLDevice::mem_copy_from under GCN1?

I would be glad to debug this further if anyone could give ideas as to where to look at next.

You could try replacing uint64_t with uint, in case the driver does not support that properly for some reason. We don’t use it in many other places.

Other than that I guess just try to pinpoint exactly where in the code becomes zero, simplify the code as much as possible to find the point where it breaks.