Embree for GPU — Weekly Reports

On this topic I’ll post weekly report on this project.

Here’s a link to the Proposal, and to the branch soc-2019-embree-gpu


Week 1 – 27/05

  • Add GUI to enable / disable this features
  • Basic implementation with Embree rtcBuildBVH :
    • Support only basic feature :
      • Render basic scene without huge slow-down
      • No support for instantiation, STBVH, nor curves

Plan for next week:


Last time I checked, BVH Builder did not support motion blur, so you may need to go with BVH Access.

Week 2 – 03/06

Improve converter to support :

  • motion blur
  • instancing

Still no support for curves

Render time should be almost the same for basic scene (including render time for scene with particle).

Plan for next week:

  • Move code to use BVH Access, so we can later support complex features (like STBVH, more complex spatial split)

Week 3 – 10/06

  • Ability to include Embree’s header and access most internal functions (only tested under linux)
  • Add extractor that support Basic access to Embree’s internal structure
  • A function that build a blender BVH4 tree from embree’s internal tree

Those function are not yet used for rendering : the tree is built, but not used

Plan for next week :

  • Convert the BVH4 to a BVH2 (which is the only type that can be used on GPU)
  • Store all the data in the packed structure

Question from a simple-minded Blender user: will this noticeably speed up Cycles rendering speed?

1 Like

Week 4 – 17/06

  • Fixed build on windows
  • Embree BVH Access is now working on GPU
  • Update UI to select between all modes :
    • Internal builder (work on CPU & GPU)
    • Embree (complete usage), work only on CPU
    • Embree BVH Builder (work on CPU, but best on GPU) support motion blur, instancing
    • Embree BVH Access (work on CPU, but best on GPU) only support basic mesh

Plan for next week :

  • Improve erformance: BVH Acess is still a bit slower than the internal builder, but they are multiple way to fix that :
    • Use specific embree features (like STBVH)
    • Improve the conversion from embree to blender (during the conversion from BVH4 to BVH2, there is still a lot of room for improvement)

@MetinSeven Currently, the rendering time is almost identical, the main benefit will be visible on specific scenes when Embree’s functionality will have been implemented (for example, STBVH shines when nearby objects are moving in the same direction). In those cases, the rendering time should improve considerably.


Week 5 - 24/06

  • Add support for instanced mesh (motion blur)
  • Performance tests with a scene that had 1 728 306 vertexes
    • Internal builder: 2min08 (Used BVH2 with a SAH of 39,3588)
    • Embree: 2min56 (Returned a BVH4 with a SAH of 47,5124, which is then reduced to a BVH2 with a cost of 63,2898).

Plan for next weeks :

  • Improve code to reduce duplication and plan future support for other type (like curve)
  • Improve conversion form Embree to BVH4
  • Improve conversion form BVH4 to BVH2

Week 6 - 01/07

  • Performance improvement
    • Internal builder : 2min08 (Used BVH2 with a SAH of 39.3588)
    • Embree : 2min10 (Returned a BVH4 with a SAH of 44.2138, which is then reduced to a BVH2 with a cost of 48.1298).
  • Support for motion blur / instanced mesh (it should support everything except curve).
  • Preparation for STBVH
    • Time limit are gathered, and added to the tree, but are not copied to the device (GPU).

Plan for next week :

  • Choose where to copy those time limit on the GPU (inside the node themself or on an side array)
  • Update traversal code to follow these limits

Week 7 - 08/07

Traversal now support STBVH, but performance are still a bit under original implementation :

  • Internal builder, with old traversal code: 15.56 sec
  • Internal builder, with new traversal code: 16.54 sec
  • Embree converter builder, with new traversal code: 17.28 sec
  • Embree converter builder, without STBVH : 18.20 sec

Plan for next week :

  • Improve BVH tree by building directly the pack structure (currently it copy Embree’s tree to Blender’s tree then fill the structure. The idea is to fill the structure directly, allowing to some optimization like primitive re-ordering)

Yes it’s going to increase performance you can check he intel website


What does your test scene look like? For transformation motion blur, I would not expect too much from going to Embree, but improvements should be significant for deformation blur. Did you test your scene with my 100% Embree implementation, how does that perform?

Week 8 - 15/07

Now the traversal build during traversal the pack structure, allowing us to :

  • Do some primitive reordering
  • Build a tree that is closer to Embree’s structure

Performance point :

  • Up to 10% faster on scene with motion blur on mesh with deformation
    • Internal builder: 2min 50s
    • Embree on GPU : 2min 30s
  • On scene with only motion blur, performance are mostly the same as Internal builder with spatial split enabled
  • On other scene, performance are slightly slower (about 5%)

Plan for next week :

  • New primitive support (curve)
  • Finalising API and plan integration into Embree’s own source

@StefanW I tried to test with your implementation, but as it can only run on CPU render time are way different, and run on a CPU with a BVH2 tree is really slow …


Did you extend Cycles’ BVH to interpolate bounding boxes over time? The major speed benefit from Embree for motion blur comes from the fact that it stores one bounding box for the beginning and one for the end of a time step, using linear interpolation for the time in-between. Cycles on the other hand uses one bounding box for the entire time step, which isn’t nearly as efficient.

No, not for the moment, it’s a point that I neglected. I will focus on this for next week (instead of support for curve).

It should make a big difference, I think. Especially for fast moving geometry or diagonal movements it will result in much tighter bounds.

Week 9 - 22/07

  • Implemented linear interpolation for bounding boxes
  • Fix a bug introduced in previous version

Performance point :

  • Up to 20% faster on scene with motion blur on mesh with deformation
    • Internal builder: 2min 50s
    • Embree on GPU : 2min 20s

Plan for next week :

  • New primitive support (curve)
  • Finalising API and plan integration into Embree’s own source

Week 10 - 29/07

Will not compile unless you use the modified version of Embree

  • Initial implementation of export from inside Embree, code is at tinou98/embree

Plan for next weeks :

  • New primitive support (curve)
  • Support for oriented bound box
  • Improvement to API to export curves and oriented bound box from Embree

Week 11 - 05/08

Plan for next weeks :

  • Fix some alignment issue
  • Support for oriented bound box

Nice work! :+1:

Will the improvements only be fully migrated to Blender master once the Embree for GPU branch is entirely finished, or are there already parts of the branch being migrated to the 2.81 master?