A few weeks ago I decided to undertake a fun hobby project just for myself to see what benefit, if any, using SSE in some of Blender’s math centric code paths would yield. In short, yes, it’s clear that Blender is leaving some performance on the floor.
However, while the gains are great at the microbenchmark level, they tend to not yield any noticeable benefit to the full scenarios I profiled. This is both surprising and unsurprising really. Unsurprisingly Blender is “big software” so it’s obvious that it would be doing much, much more than spinning in math-heavy code paths to the point where humans might notice. However, it’s still surprising that a 3d DCC is doing so much non-math work in the profiled scenarios too
Additionally, without more invasive Blender work, performance would still remain lower than optimal due to the storage format used in the math centric code paths (float).
So I want to put this here to see what the Blender developers think of this effort and to help answer the following questions:
- Is converting the math routines that I outline in my benchmark to SSE interesting to do considering the results I present?
- Generally, what changes are allowed in DNA?
- Would modifying some float fields to be float be allowed (disregard if we want to do this; I just want to know if it’s possible in a backward compatible way)?
- Would adding a bmesh layer like
CD_NORMAL_SSEas an experiment be less invasive than changing
CD_NORMALitself? Or would it be a similar amount of work? Would back-compat be better or worse here?
Full benchmark code (GPL 3), the motivating scenarios that were profiled, and the results are here: https://github.com/jessey-git/blender_bench