Code Style: Integer Types

That’s fine with me.

On integer arithmetic and unsigned types, I think the important thing here is the validation of inputs or at least assertions. If you’re subtracting one number from another and expect a result >= 0, then you should check that.

If you simply store the result in a signed integer without checking then your passing the problem down the line.

As such, I think some things like array indexes should be unsigned.

Seems OK.

Although it would be good to settle on how we go about applying it. Also, where we accept some exceptions to these rules.

For example, code that deals entirely with allocations could continue to use size_t (as noted previously).
For our own versions of libc functions (wrappers in some cases) do we use our own types or match the types libc uses

Suggest to use this convention for new code and existing code, module owners/members can apply these rules.
I’d rather avoid global replacements unless there is some agreement among maintainers beforehand.

@MarkStead
I do understand your thought, because I was thinking the same. Since then I learned, that unsigned integers should not be used this way. That is why there is the Don't use unsigned integers to indicate that a value is non-negative rule. You don’t have to take my word for it. Have a look at this video from min 42:40 to 45:30.

@ideasman42
What you mention seem to be very general rules to me. They are not specific to the usage of integer types. We can add this separately to the guidelines, but I feel like this is covered by the first “two important rules” in the style guide.

Also, global replacements should always be approved among maintainers, independent of what is being replaced.

Not sure which functions you have in mind specifically. I don’t mind either way. However, if we have a wrapper and we can improve the api by using signed integers instead of unsigned ones, then I think we should use signed integers instead of copying “mistakes” from the past.

Not sure which statement you’re referring to here.


Of course we’ll have to take this on a case-by-case basis, some that came to mind are guarded-allocation, BLI_qsort_t wrapper, wrappers for file system access like BLI_file_size.

There may be cases where the size_t needs to be able to fit into a pointer too, so swapping out for int64 then means breaking 32bit builds, which maybe OK but is outside the scope of this proposal.

Maybe I interpret them in a too general way, but I referred to those:

  • When making changes, conform to the style and conventions of the surrounding code.
  • Strive for clarity, even if that means occasionally breaking the guidelines. Use your head and ask for advice if your common sense seems to disagree with the conventions.

I was just looking into BLI_file_size. I’m not familiar with the API, but it seems like stats.st_size is actually a signed integer and we are casting it to an unsigned integer. In the case of BLI_qsort_r it seems to be better to not touch the signature. But yeah, this needs to be checked on a case by case basis.

Well, we should not be using size_t to hold a pointer in the first place. But then again, we just must not blindly replace one integer type with another. Of course errors can happen when this is done without care.

I added rules to our style guide: https://wiki.blender.org/wiki/Style_Guide/C_Cpp#Integer_Types

I also added a note that the top, that hopefully covers the concerns of @ideasman42.

Thanks for the feedback everyone. I think this is an important step in the right direction.

If storing a pointer inside an int can’t be avoided, intptr_t is the type that’s intended to be used. Use cases are for example when attempting to sort items by memory address, since comparison operators on pointers are (with the exception of equality) largely undefined.

Right, I wanted to add that to the rules in the first post, but forgot about it. I added it to the style guide as well, assuming that no one has another opinion.

Only use int and char of the builtin integer types.

The doesn’t mention uchar, this could be read as if uchar should be avoided.

Since Blender compiles with all chars as unsigned this isn’t a meaningful difference, just raising it since it’s not clear what the guideline is suggesting.


Use unsigned integers in bit manipulations and modular arithmetic. When using modular arithmetic, mention that in a comment.

When using unsigned integers, always use uint8_t, uint16_t, uint32_t or uint64_t.

Do we want to apply to bit-fields? Reading this literally it means we should change:
uint some_mode : 2; to
uint32_t some_mode : 2; … while it could be argued this is better just because of consistency - it’s also not solving any problems AFAICS.
We could use uchar in these cases too if we just want to avoid uint.

I thought a bit more about this idea. It did sound like a good idea, until I noticed that this won’t work when we use 64 bit integers on a 64 bit machine. The problem is that array[-1] is the same as array[(uint64_t)-1] due to wrap around. This trick does work when 32 bit integers are used.

I don’t have a strong opinion about uchar. My only argument against using it is that uint8_t is more explicit, but since we assume uchar is always 8 bit, this does not solve a real problem. Should I just add uchar to the list of integer types that can be used?

I don’t have a strong opinion on that either. In both cases the bit width of some_mode is explicit, which is the most important thing.
However, I’d assume that when you use bit fields, you want to make your struct as compact as possible. Therefore, it makes sense to use the smallest integer type that can hold enough bits. So you probably want to be explicit about the size of the integer type, which means you should use uint8_t or similar types.

Personally I’m a strong proponent of C99 (ie [u]int[n]_t) types all across the board, they signal intent and guarantee size, what’s not to like. However given the blender has been historically been using the non standardized glibc types uint,ushort, I didn’t think we’d even consider C99, however if you’re going for it, I’m with you!

I’ll happily toss my +1 nah + :100: in

This would happen for C++ wrapping array access, if the value is cast to an unsigned value, however it’s not true for pointer indexing. Negative indexing is used in the current code-base.

Also, there are enough cases we pass values as offsets. Making these signed will hide negative access too.

This can be especially annoying when the value is small since our own malloc wrapper has a header, causing memory checking tools not to report it as an error.

Agree, the only down side AFAICS is changing this will touch many lines of code, causing merge conflicts etc (and a bit more typing).

We could limit the use of char to text. All numeric use could be uint8_t or int8_t.

My point is, this works with 32 bit integers on 64 bit computers, but not with 64 bit integers. This is just an artifact of how arithmetic modulo 2^64 works.

Also have a look at the generated assembly for different cases: https://godbolt.org/z/vPPbro
When using 64 bit indices, the assembly is the same when you use signed or unsigned indices.
When using 32 bit indices, they are not the same, because signed indices are sign-extended.
When you add -m32 to the compiler flags to compile for 32 bit systems, you can see that now there is no difference between using signed/unsigned 32 bit indices.

I haven’t tried it, but I think we could use array[index & ~(1ll << X)] in the subscript operators. This way it is much more likely that negative indices result in a crash, even for 64 bit indices. X would have to be some number smaller than 64. But it also must not be too large, to avoid compiler optimizations. Might be interesting to try, but I don’t really want to add this overhead to every memory access.

Speaking of text, i really wish we had different types to signify if a certain piece of text is expected to be ascii or utf8 rather than just passing char* and hope you can deduct from either the function or variable name what it ought to contain, also there are code-paths that assume wchar_t is 32 bit (which it is not on windows) which is also “not great”. Text is without a doubt an area where we can do better, but untangling the mess we have now will be on the time consuming side of things.

Sure, but in in practice this makes avoiding unsigned int worse in some cases. For example, many parts of the BMesh code pass custom-data offset as uint, if I switch to int, int64_t or uint64_t - it wont get crashes on underflow, instead it often fails silently.

While adding asserts to every function that takes this argument technically solves the issue, it’s verbose & inconvenient. Since this is a small offset passed around in many places, which doesn’t have arithmetic performed on it. I think it’s reasonable to leave this as is (or switch to uint32_t if we just want to avoid the term uint).

A lot of the wchar_t was recently replaced with char32_t. Besides windows spesific code which requires wchar_t.
Although a quick search shows up some cases which still need to be changed.

As somebody whose last name gets mangled very often, yes please. Distinguishing between “8-bit number” and “character” is IMO the first step to sanity :wink:

Personally I don’t see the use of uchar, I prefer uint8_t instead. I’ve never really liked any of those “usually it means X but it may mean Y on a different compiler/platform/configuration” types.

5 Likes