A rigger working in Blender recently filed this issue about erroneous rotational shifts when giving an object a parent:
Investigation revealed that the root issue is floating point rounding error in the matrix math routines.
There was a previous discussion topic (now archived) about floating point precision in a broader context than just transforms. But one of the last things written was from @brecht:
What I believe Unreal is doing is having all object matrices and associated data in double precision, and vertex coordinates in single precision. And then for OpenGL for example you end up rendering in camera space in single precision, or collision between two objects may be in single precision in the local space of one of the objects or somewhere near it, etc.
That kind of approach seems doable without too much of an impact on memory and performance. Also would not have to template so much, though still requires a ton of work, including carefully choosing the spaces to do operations in.
I want to re-raise this as something we might want to do. As noted by Brecht, it would be a fair bit of work and would require answering some questions. And of course there would still be floating point rounding error (just far smaller), so it wouldn’t solve the issue in the sense of being infinite precision. But practically it would effectively solve a lot of these issues, and make things behave more intuitively for users.
For certain operations we could also consider using algorithms designed to avoid catastrophic cancellation. Cross products, dot products, and matrix multiplication are all candidates for that.
Although the issue in that post is indeed also related to floating point rounding error and transforms, it’s not actually relevant to the issue in Blender. The issue in that post is due to actually storing the source-of-truth transforms in a matrix, and the resulting accumulated error in that matrix over time. One of the proposed solutions in the post is to instead store your source-of-truth transforms as separate translation/rotation/scale components. But Blender already does that, and thus isn’t susceptible to that particular issue.
Well, as mentioned in the issue, using double for internal temporary values inside the math functions could improve things significantly with minimal changes to the bulk of Blender and its data structures.
I already implemented a cross product function that internally uses double (while input and output values are in float) to fix issues near the singularity in Damped Track quite a while ago.
A big source of major precision errors are situations where you compute a sum of products (like in cross product or matrix multiplication), and doing that computation in double mitigates that a lot.
Since this is mostly a trade-off between accuracy and performance/memory, I think we’d need some more stats for (a) how often do people actually run into these problems and (b) what performance impact using double has.
On the performance front, I’m mostly concerned about the case when dealing with lots of instances. The per-instance transform matrix takes up the most space. Doubling the matrix size also almost doubles the memory needed for instances. Vectorized matrix code also probably becomes quite a bit slower when switching to double. Converting these matrices between float and double on the fly probably also has significant additional performance implications. It might be feasible to let the user decide between single and double precision, but it’s unclear whether that’s worth the effort.
It’s a very common problem when building bigger scenes, users need to manually keep things near the origin to avoid artifacts. But this isn’t always possible, and ideally not something that users should have to worry about much in the first place.
There is indeed a performance and memory impact, but I think the trade-off is worth it.
Just recently I reconfigured a couple background scenes, offsetting the main decor so that the characters would be animated near the world origin. Because I had initially placed them ~2km from the origin and posing was impossible, jitters one centimeter big. I was surprised at the quick loss of precision. So there’s that, and I’m sure others have seen and worked around that before because I don’t build the most complex scenes. I assume like vertex positions, keeping bones in single precision in their object space should be plenty
It’s really hard for me to imagine a situation where someone deliberately placed their scene far from the center of coordinates, which could cause accuracy problems. Most often this is due to very specific situations …
It’s not that specific, just regular production stuff when you model a city, forest or other big environment that is common in movies. Issues happen much closer than 2km too.
Would it be possible to set the precision either to float or double via a scene setting?
I’m just a regular user, but in my experience this kind of issues don’t usually just happen. In a production it is easy to guess from early on if some set is prone to be troublesome accuracy-wise due to its size, even from before actually modelling anything. In our studio it is common to have discussions from concept/design phase about how to tackle a set, and the fact that is going to be too big to handle properly is well known (and we end up doing things like the one @Hadriscus mentions, offseting parts of the set to the origin and such)
This is to say: would it be possible to set the precision as a scene property of sorts, so it’s only enabled when needed and not have a performance penalty by default?
Is it just me, or do you want to show the problem of depth buffer accuracy?
This is a little off topic… hopefully. Because as far as I know (and I don’t know a lot), the depth buffer has a fixed precision. Fixed on equipment.
I assumed that precision as is being discussed was directly related, but if not please do disregard… Definitely not trying to take this off topic, my apologies if I was.
This is to say: would it be possible to set the precision as a scene property of sorts, so it’s only enabled when needed and not have a performance penalty by default?
My best guess is that the performance penalty is unlikely to be large. The math operations we’re talking about are per object, not e.g. per vertex. And the same thing applies to memory overhead: these transforms are per-object, and would be a tiny percentage of the total memory for even a low-res model.
Having said that, I’m making assumptions here, which could turn out to be wrong. And I certainly don’t think we should jump into this blind. If we decide to move forward with this, the first step should be to prototype it out and see what the actual impact is on a variety of production scenes and using a variety of Blender features. And then we can make an informed decision based on that actual measured impact.
On the performance front, I’m mostly concerned about the case when dealing with lots of instances. The per-instance transform matrix takes up the most space. Doubling the matrix size also almost doubles the memory needed for instances.
Yeah, that’s a really good point. Instancing is certainly one of the use cases we’ll need to test for impact.
One concert brought up was the impact on GPU interactions if we moved everything over… but since for the actual issue that came up there seems to be an issue in the applied once math to figure out the final location of something, can we track down what part is actually causing the problem in the code for the parent-no inverse, and apply-parent inverse operations?
I’m not opposed to moving to double for matrices. I’m not entirely sure yet if that would essentially apply to all matrices in Blender or not. I don’t fully understand the impact of moving matrices but not moving vertex coordinates yet. Are most related problems solved even if vertex coordinates are converted back to float?
Is it possible to dynamically detect whether a specific double matrix can be used as float matrix without loosing too much precision?
I think vertex coordinates and the matrices used to transforms them would be floats. The trick is to never transform vertex coordinates to world space, but always keep them in a local space.
For example if you do a boolean operation between two objects A and B, do not transform vertex coordinates from both to world space. Instead transform the vertex coordinates from A into the local space of B.
Computing the A * inverse(B) matrix would be done in double precision, and then converted to a float precision matrix. Assuming objects A and B are near each other in world space, the translation component should mostly cancel out and be accurate enough as float.