Code Style: Integer Types

jacqueslucke · July 14, 2020, 8:22am

The goal of this thread is to come up with good rules regarding the usage of integer types in Blender. Those will then be added to our code style guide.

Please read the “Arithmetic” section of the C++ Core Guidelines and the “Integer Types” section of the Google C++ Style Guide.

Below I formulate work-in-progress rules. I’ll update them according to the feedback I get.

Integer Types

Only use int and char of the builtin integer types. So, do not use short, long or long long. Prefer fixed size integer types like int16_t instead. You can assume that int has at least 32 bits.

Use int64_t for integers that we know can be “big”.

Use bool with true and false to represent truth values (instead of int with 0 and 1).

If your code is a container with a size, be sure its size-type is large enough for any possible usage. When in doubt, use a larger type like int64_t.

Use unsigned integers in bit manipulations and modular arithmetic. When using modular arithmetic, mention that in a comment.

When using unsigned integers, always use uint8_t, uint16_t, uint32_t or uint64_t.

Don’t use unsigned integers to indicate that a value is non-negative, use assertions instead.

Since bit operations are used on flags, those should be unsigned integers with a fixed size.

If your code is using uint already, try to avoid doing any arithmetic on values of that type. Additions of small positive constants are likely OK but avoid subtraction or arithmetic with any values that might be negative.

As with most new rules, our current code base does not make a good job of following them. Nevertheless, I think it is worth adding these rules to our style guide. This way, new code can follow better practices and we have justification for cleanup commits in existing code.

CC: @brecht @ideasman42 @mont29 @LazyDodo @julianeisel @Howard_Trickey @sergey @JeroenBakker @sybren

sergey · July 14, 2020, 9:17am

Oh yes, please! especially the unsigned integer usage!

However, not sure about the container’s size type paragraph. At least, not in this exact wording. Size type is also used to address elements by index. I would phrase it in a way that size-type in container used for sizes and indexes is to be large enough.

Another typical use of unsigned integers in Blender I’ve noticed is to try to make it possible to represent 2x of elements compared to signed integer type. While this could work in a localized area of code, usually this doesn’t work when considering the API as a whole. I would make it more explicit discourage in the guidelines to use unsigned for such purposes.

jacqueslucke · July 14, 2020, 9:23am

Can you suggest a new wording?

I don’t understand the use case you describe tbh. Can you show an example?

JeroenBakker · July 14, 2020, 9:36am

I think it is good to be more specific like you propose and don’t use value types that are known to be problemsome.
Perhaps also add bool to the topic as legacy code still uses int when they are clearly meant to be bool. We should not do that.

Making new rules is easy. Implementing them is time consuming.

jacqueslucke · July 14, 2020, 9:41am

Right, I added a sentence for that.

Yes, but we have to start somewhere. Better we start with the rule than the implementation of a non-existent rule.

Howard_Trickey · July 14, 2020, 10:19am

I am strongly in favor of the rules here. I’ve seen too many bugs over my years of programming due to doing arithmetic on unsigned types (especially subtraction). Maybe something should be added to say that if you are forced to used unsigned types, or are using them in a legacy section of code, avoid arithmetic on those values.

What is your impression of what to do about Array, Vector, VectorSet, Span, Set, etc.? Which all use uint in lots of places and violate the rules you wrote. I wrote a lot of code using indexes into those containers for various purposes. At first I wrote it using int for the type in my code, following the guidelines you cited. The code was littered with static_cast<int>(something.size()), and couldn’t so easily use the index_range() functions unless I used static_cast<int> on those variables too (sometimes in the loop one needs the index, in code that is using those indices in various ways). I since changed the code to use uint everywhere, and I don’t like it as much, though it does avoid the need for almost all of those static_casts.

The big question is: what is “large enough for any possible usage” of these containers? For me, in particular, this comes down to: what maximum size of vertices, faces, and edges do we want to be writing Blender code for? As a practical matter, the algorithms I am writing now will become unreasonably slow if there are more than 2^31 of any of these, so I am comfortable with assuming the number of each will fit in an int [see Google guidelines that one should assume ints hold at least 32 bits in modern times on modern cpus, which I agree with, and maybe should be added to your rules]. Is 2^31 limit on number of mesh elements sufficiently future-proof?

jacqueslucke · July 14, 2020, 10:32am

If these rules above are accepted, I will update all of them to use int64_t instead of uint. And I hope that you are fine with changing back to int.

Makes sense, can you propose a wording for that rule?

I don’t see Blender ever supporting meshes with more than 2^31 elements. We probably could, if we really wanted to, but it requires lots of changes that are probably not worth it… Not sure what others think.

The containers should probably be able to hold more than 2^31 values, even if we don’t support that many elements in a mesh.

I added that already.

Howard_Trickey · July 14, 2020, 10:45am

Yes, probably best to use int64_t for container sizes and indices. In spite of, as you say, mesh elements unlikely to exceed 2^31 in the foreseeable future (and if it does, all the code that uses element indices would need to change).

For the wording of the “be careful with unsigned arithmetic” rule, how about something like:

If your code is using uint already, try to avoid doing any arithmetic on values of that type. Additions of small positive constants are likely OK but avoid subtraction or arithmetic with any values that might be negative.

Sorry I missed that you had already specified the assumed minimum size of ints in your rules.

jacqueslucke · July 14, 2020, 10:48am

Thanks. I added the “rule”.

LazyDodo · July 14, 2020, 3:31pm

Pro uint :

it’s only 4 bytes?

Con: uint

It’s only 4 bytes, we have textures larger than this, it’s just not right to use this for a size field in 2020
unsigned, which leads to all kinds of icky math bugs.

Pro size_t

You know the thing you are holding is a size, it adds context, tells you what this piece of information is, rather than just going, it’s a number
All the stl containers use it for the size() and subscripts.

Con size_t:

size(size_t) is not fixed
unsigned, which leads to all kinds of icky math bugs.

Pro int64_t:

Signed, no icky math bugs
Will be the right type until we have stuff larger than 9223372 TIB
Can return -1 for error?

Con int64_t:

you lose a bit of context about the data you are holding.
not what most devs will expect in an stl like container
any negative number does not make a whole lot of sense for size()

So i’d lean to int64_t for things like size fields.

ideally (but i’m pushing it here)

Pro typedef int64_t bsize_t; (or blender::size_t)

All the benefits of int64_t
Additional context of the data you are holding

Con typedef int64_t bsize_t;

looks like size_t , but is not size_t which is not intuitive

jacqueslucke · July 14, 2020, 3:44pm

Question: If we use bsize_t (and I’m not sure we should), should this type also be used in subscript operators, or should those use int64_t in any case? An index is not really a size, is it?

Personally, I’m fine with a size being “just a number”. Not sure if the additional context bsize_t provides is useful to me. Maybe I’m just missing something, because I’m not used to it.

LazyDodo · July 14, 2020, 4:29pm

In an ideal world the type system would be very explicit, so the compiler would warn/prevent you from doing

auto some_var = number_of_cats_in_my_house + distance_to_the_moon;

unless there is a logical way defined for those types to interoperate, however i do realize that is not the type system we have, and it’ll require quite a bit more than just a typedef to get there, hence i did mark it as “pushing it”

int64_t would be infinitely better than uint already so if that’s were we end up i’d count it as an improvement.

ideasman42 · July 15, 2020, 2:40am

While the proposal seems fine in most situations - especially DNA types and public API’s, I’d prefer to continue using unsigned in some isolated cases.

Where interfacing external API’s would cause extra/unnecessary conversion between types, especially in cases where the API is glue - forwarding arguments between API’s.
There will probably be some cases where the 2x range useful, where bumping to 64bit int to fit a rules doesn’t make much sense - especially in cases where the memory usage could be increased noticeably.
In a handful of situations where unsigned types are isolated to a single file/area and arithmetic is limited. BLI_array_store is an example where I’d rather keep using size_t as it’s dealing with memory allocation, adding asserts all over hurts readability. Further, any negative values are going to hide negative array access, which IMHO is worse than wrapping to a large number and crashing - since it’s more likely to slip by unnoticed and corrupt data - on release builds. For debug builds ASAN can be useful to point out the unsigned wrapping when it happens - which needs to be done manually with asserts otherwise and is easy to accidentally miss if it’s needed after every operation.

StefanW · July 15, 2020, 7:48am

It’s de facto 4 bytes, but not by definition. It may be something else in past or future architectures or compilers, if I read the standard correctly, the only guarantee we have about int is that it can hold at least the [−32,767, +32,767] range. IMHO, whenever we need something that is 4 bytes and only 4 bytes, (u)int32_t it is.

Correct, and that has bitten many people in the 2000s when they had to port software to x86_64 or ppc64. It compiled alright, but still crashed when you threw large images at it.

That’s the whole point of size_t. In case we some day use 128bit computers, any code that uses size_t for counting things will be ready for when we want to simulate all atoms in multiple universes…

We may also have people who still want to run Blender on 32bit architectures, using size_t would make sure they don’t get unnecessarily bloated or slow builds.

I’m almost on board with that idea. Personally, whenever interfacing with external APIs, I prefer using their types. So bsize_t for Blender containers, size_t for STL containers. That reduces unecessary casts and mixed sign comparisons, therefore reducing compiler warnings and potential bugs.

Regarding mesh elements greater than 32^1 - never say never. Machines today certainly have enough memory for that, and with LIDAR scanners showing up in consumer devices, such meshes could become more common. That doesn’t mean we have to inflate all of our data structures to int64_t right away, but a typedef for those indices could help us keeping things compact for now while making it easier to upgrade in the future.

For what it’s worth, I’m speaking from the experience of having ported Poser from 32bit i386/ppc (with 68k legacy) to 64bit x86_64/ppc64: Never assume a fixed size of bool, char, int or long, they can and will change between platforms. When you need a certain number of bytes, always spell it out via (u)int*_t. When you need to count things, use a typedef xxx mysize_t.

Pointer manipulations are bad style to begin with, but if you can’t avoid id, ptrdiff_t and (u)intptr_t are IMHO the only correct type to use.

Also note that neither float and double have a defined byte size in C99.

jacqueslucke · July 15, 2020, 8:47am

I’m fine with introducing bsize_t in Blender. Obviously, this needs feedback from the core devs. As I understand this, it would work as follows. Please correct me when I’m wrong.

In BLI_sys_types.h we add typedef int64_t bsize_t;. The type should be available in C and C++, so bsize_t is better than blender::size_t imo. It should be a signed type, so that we don’t do the same mistake the standard library did again.

Then, containers use bsize_t as their size type. Their ->size() function returns bsize_t as well as any methods that return an index into the container. The subscript operator also takes a parameter of type bsize_t.

When iterating over indices, we should start using bsize_t for the index variable. So for example:

for (bsize_t i = 0; i < container.size(); i++) { ... }

It should be noted that the suffix _t is reserved by Posix. Searching for "bsize_t" suggests that this name exists or did exist on some platform. Not sure if this is enough reason to discard this specific name.

StefanW · July 15, 2020, 8:59am

If we typedef our index type to int64_t, then I think it should be called bssize_t or similar. I think the type name should indicated that it is a signed type - naming it after the unsigned size_t could lead to confusion.

For what it’s worth, a quick google search hints that HP-UX is using the type bsize_t.

LazyDodo · July 15, 2020, 12:36pm

i’m divided on this, however we could move the b to the suffix size_bt , i’m not loving it, but not hating it as much as i thought i would either.

ideasman42 · July 15, 2020, 11:54pm

Do we really need to typedef this? We’re going to need to assume the underlying type in enough places anyway (string formatting, interacting with external libraries, int conversion warnings & when to cast).

While I see the point of having an identifier that expresses size, using our own signed size type everywhere feels like a heavy solution to a problem that doesn’t seem to be causing us issues very often (from what I can tell).

We could use standard types like this:

Keep int for container types (heap, hash size).
Use int64_t for offset/size in bytes, file reading, etc.

If there is some concern that int is too small, longer term - we could still do the above steps and move from int to int64_t for container types. This just makes the change less global, so it can be done more gradually.

jacqueslucke · July 17, 2020, 8:54am

I suggest we proceed as follows:

We accept the rules as they are in the initial post.
We don’t introduce a bsize_t (or similar) type for now. If people think bsize_t is important enough, it can be discussed further in a separate thread.
I’ll update the C++ containers to use int64_t, but that does not mean that all existing C containers will have to use that as well. They will only need to be updated when a use case arises.
The usage of unsigned integers, short, long and long long in existing code can be updated as part of general cleanup and/or code quality day.

Are these decisions OK for everyone? If yes, please say so in a comment or like this post.
I’ll update the style guide, if at least four of the following devs accept the new rules, and there are no opposing votes: @brecht, @ideasman42, @mont29, @sergey, @JeroenBakker, @sybren.

sybren · July 17, 2020, 9:16am

Although I like the idea of a size_t type that can later be redefined in a hypothetical future 128-bit situation, I’m afraid that that still will only be suitable for a subset of the uses of size_t. People will always make assumptions about the size of size_t, so I’m not so sure whether we should introduce yet another type that has a basically unknown size. After all, if you start to know the size of size_t, it’ll loose its ability to be scaled up on future hardware. It seems out of place in this proposal, given that the rest is basically going from unknown/flexible types like int and long to rigid types like int64_t.

Personally I’m a fan of int64_t, uint32_t, etc. as it takes away any doubt about the size of variables.

Als +1 on gradually moving from short/int to bool when it’s about boolean values.

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Code Style: Integer Types

Integer Types

ideally (but i’m pushing it here)