Screen-space viewport picking (GPU-based raycast alternative)

Apologies in advance for a potentially long-winded post that explores the limitations of raycasting in Blender, and a high level technical proposal for a solution.

At my studio we have written a lot of internal scripts and addons, and many use raycasting to interact with the objects and elements in the scene. This is always an area I have felt that Blender falls short of other DCC apis- because there are multiple raycast methods available to us and they all have their own unique limitations that we end up having to write hacky workarounds for. I have spent a lot of time working with raycasting in bpy, so here are a handful of my complaints (before I get to the proposed solution):

BVH raycast:

  • very fast, but requires a BVH tree to work, which is extremely slow to build on dense geometry, and has to be built for each object you want to hit (so a viewport-wide raycast is generally out of the question in a production environment like ours).
  • It hits geometry that is invisible (backfaces when they are being culled, hidden faces), forcing ‘penetration re-casting’ which adds additional recursive logic and complexity.
  • cannot raycast to non-mesh objects, or meshes that have no faces.
  • when picking from a viewport you have to take near/far camera clipping planes into consideration to avoid hitting faces that are culled due to clipping (it’s possible, for example, to raycast from the camera and accidentally hit a face that is clipped by the near plane if you did not offset your starting point to accomodate the near plane)

Scene raycast:

  • very fast
  • can’t raycast to unevaluated geometry, forcing you to disable modifiers, raycast, then re-enable them.
  • doesn’t work in edit mode without first pushing edit mode changes to mesh data (which is very slow).
  • hits objects that are not part of the local view, forcing you to hide objects you do not want to hit (likely everything not in local view).
  • cannot raycast to non-mesh objects, or meshes that have no faces.
  • when picking from a viewport you have to take near/far camera clipping planes into consideration to avoid hitting objects that are culled due to clipping

Recently I had some downtime and started thinking of alternatives to circumvent these limitations and may have come up with an idea worth experimenting with and thought that I would post it here for discussion.

There are many uses for raycasting, but by far the most common use-case for raycasting is viewport picking. Take screen space coordinates, return the object or element directly under those coordinates. And in most (if not all) of these situations, you want a ‘what you see is what you get’ picking scenario- so things that are not visible should not be available for picking.

My proposal is for a GPU-based screen space picking solution- map each object or element to a color and write that color to an offscreen texture. Then use screen coordinates to look up the texture and return the object or element visible at that pixel based on the color that was found. My gut tells me this would be extremely fast, since you’re essentially just using a shader to write a solid color to an offscreen texture. Obviously this introduces a potentially large additional texture to VRAM depending on screen resolution, but I honestly think that a smaller buffer could be used (half the viewport size, maybe less). A drop in the bucket really…

One argument might be- why not just elminiate the need for this type of solution by finding out why raycasting is slow and make it faster, or why not introduce new options to allow for circumvention of the various ‘downsides’, and it really comes back to that ‘general use case’ in my opinion: 90% of the time, when a python developer needs to raycast they are interested in viewport picking. As it currently exists, raycasting from one arbitrary point to another works great- but it’s far too generic for what most people want to use it for and it introduces a lot of technical burden to studios that need robust viewport picking for their pipelines.

Based on what I have seen being worked on with the python gpu module, I think that most (if not all) of this could theoretically be done in Python- and I would be up for prototyping out this proposal at some point if it sounded like an idea with merit.

Any thoughts? Potential gotchas I might not have thought of here?


This post was flagged by the community and is temporarily hidden.


I’m not sure I understand your post, what I’m suggesting really doesn’t have anything to do with eevee. Looking through the task you mentioned and I don’t see anything relevant to what I proposed, maybe I am missing something?

That does not make sense, sorry.

VDB Simulation is nothing, I mean OpenVDB is a volume storage format, simulators or solvers have nothing to do with VDB other than they store the result in that format (or not)
Also GPUs are good at repetitive tasks, but not so good in handling high amount of different data, for example, liquid solvers that are GPU based are only slightly faster than CPU based and only in some cases.
Also GPU’s have a bug limitation, it’s memory, having 24Gb of vRam or even 48, means that you se very limited to which scenes you can work with, and if you use system memory you get a big penalization.

GPU’s are good for some tasks, and CPUs are good for other tasks, it all depends on the real scalability of the tasks. Have you though why you need a CPU for your operating system instead of using only a GPU to drive it?

Finally, the GPU based selection proposed here is related with the selection method of objects In the viewport, AFAIK now it uses depth picking, and the proposed system is more like a mask system (if I got it right), both run on the GPU, just a different approach :slight_smile:

What I’m suggesting isn’t about selection per se, it’s about picking/sampling for the python API. Often when developing an addon you need to sample something under the mouse or at a screen position and return the object or element at those coordinates. Currently the only way to do this is to use one of the existing raycast methods (which have the numerous problems listed in my first post).

What you’re describing is similar to how the selection system in Blender works. Writing colors is how it used to work quite a while ago, now by default it’s using occlusion queries and is better at selecting subpixel objects.

Access to that system could be exposed in the Python API. There could be an API call that returns a list of objects (or mesh elements) + screen space distances within a given region of a viewport.


That would be amazing. We could get rid of so much boilerplate and edge case handling in our pipeline if this was made available.


I think it would be great if this could be achieved.
The Add-on I’m developing uses python+numpy to handle viewport picking. This process is very slow and has become a bottleneck for the add-on.
If this Python API is provided, my add-on will be more comfortable and stable.


I’m very much interested in this too :slight_smile:

Do I understand correctly that “screen space distances” essentially means the depth-buffer data for each pixel? This could be really useful for some advanced use-cases (estimating normals, “batch” raycasting of many screen rays at once, etc.).

By the way, would it also be possible to provide a custom view matrix to such an API? (e.g. so that fast raycasting against non-meshes could be used for non-viewport rays too)

1 Like

For anyone that wants to contribute an implementation of this, the relevant internal API is DRW_select_buffer_*.

This does not include a depth buffer, I meant an XY screen space coordinate. There are other APIs to get the depth buffer so it would be possible to associate those things. Internally these are different things. For example the Auto Depth feature uses the depth buffer, but does not care which object the depth values come from.

I imagine this could be supported for offscreen 3D viewport rendering with a different view matrix as well, though of course that’s more expensive than using an existing 3D viewport and all the buffers already setup for that. And at some point if you are ray-casting from many directions it becomes more efficient to use a BVH.