In today’s cross-platform C++ (or C) world we have:
Data model | short | int | long | long long | pointers/size_t | Sample operating systems
...
LLP64/IL32P64 16 32 32 64 64 Microsoft Windows (x86-64 and IA-64)
LP64/I32LP64 16 32 64 64 64 Most Unix and Unix-like systems, e.g. Solaris, Linux, BSD, and OS X; z/OS
...
What this means today, is that for any “common” (signed) integer, int
will suffice and can possibly still be used as the default integer type when writing C++ application code. It will also – for current practical purposes – have a consistent size across platforms.
Iff a use case requires at least 64 bits, we can today use long long
, though possibly using one of the bitness-specifying types or the __int64
type might make more sense.
This leaves long
in the middle, and we’re considering outright banning the use of long
from our application code.
Would this make sense, or is there a case for using long
in modern C++ (or C) code that has to run cross platform? (platform being desktop, mobile devices, but not things like microcontrollers, DSPs etc.)
Possibly interesting background links:
- What does the C++ standard state the size of int, long type to be?
- Why did the Win64 team choose the LLP64 model?
- 64-Bit Programming Models: Why LP64? (somewhat aged)
- Is
long
guaranteed to be at least 32 bits? (This addresses the comment discussion below. Answer.)
19
The only reason I would use long
today is when calling or implementing an external interface that uses it.
As you say in your post short and int have reasonably stable characteristics across all major desktop/server/mobile platforms today and I see no reason for that to change in the foreseeable future. So I see little reason to avoid them in general.
long
on the other hand is a mess. On all 32-bit systems I’m aware of it had the following characteristics.
- It was exactly 32-bits in size.
- It was the same size as a memory address.
- It was the same size as the largest unit of data that could be held in a normal register and work on with a single instruction.
Large amounts of code was written based on one or more of these characteristics. However with the move to 64-bit it was not possible to preserve all of them. Unix-like platforms went for LP64 which preserved characteristics 2 and 3 at the cost of characteristic 1. Win64 went for LLP64 which preserved characteristic 1 at the cost of characteristics 2 and 3. The result is you can no longer rely on any of those characteristics and that IMO leaves little reason to use long
.
If you want a type that is exactly 32-bits in size you should use int32_t
.
If you want a type that is the same size as a pointer you should use intptr_t
(or better uintptr_t
).
If you want a type that is the largest item that can be worked on in a single register/instruction then unfortunately I don’t think the standard provides one. size_t
should be right on most common platforms but it wouldn’t be on x32.
P.S.
I wouldn’t bother with the “fast” or “least” types. The “least” types only matter if you care about portablility to really obscure architectures where CHAR_BIT != 8
. The size of the “fast” types in practice seems to be pretty arbitary. Linux seems to make them at least the same size as pointer, which is silly on 64-bit platforms with fast 32-bit support like x86-64 and arm64. IIRC iOS makes them as small as possible. I’m not sure what other systems do.
P.P.S
One reason to use unsigned long
(but not plain long
) is because it is gauranteed to have modulo behaviour. Unfortunately due to C’s screwed up promotion rules unsigned types smaller than int
do not have modulo behaviour.
On all major platforms today uint32_t
is the same size or larger than int and hence has modulo behaviour. However there have been historically and there could theoretically be in the future platforms where int
is 64-bit and hence uint32_t
does not have modulo behaviour.
Personally I would say it’s better to get in the habbit of forcing modulo behaviour by using “1u *” or “0u +” at the start of your equations as this will work for any size of unsigned type.
1
As you mention in your question, modern software is all about interoperating between platforms and systems on the internet. The C and C++ standards give ranges for integer type sizes, not specific sizes (in contrast with languages like Java and C#).
To ensure that your software compiled on different platforms works with the same data the same way and to ensure that other software can interact with your software using the same sizes, you should be using fixed-size integers.
Enter <cstdint>
which provides exactly that and is a standard header that all compiler and standard library platforms are required to provide. Note: this header was only required as of C++11, but many older library implementations provided it anyway.
Want a 64 bit unsigned integer? Use uint64_t
. Signed 32 bit integer? Use int32_t
. While the types in the header are optional, modern platforms should support all of the types defined in that header.
Sometimes a specific bit width is needed, for example, in a data structure used for communicating with other systems. Other times it is not. For less strict situations, <cstdint>
provides types that are a minimum width.
There are least variants: int_leastXX_t
will be an integer type of minimum XX bits. It will use the smallest type that provides XX bits, but the type is allowed to be larger than the specified number of bits. In practice, these are typically the same as the types described above that give exact number of bits.
There are also fast variants: int_fastXX_t
is at least XX bits, but should use a type that performs fast on a particular platform. The definition of “fast” in this context is unspecified. However, in practice, this typically means that a type smaller than a CPU’s register size may alias to a type of the CPU’s register size. For example, Visual C++ 2015’s header specifies that int_fast16_t
is a 32 bit integer because 32 bit arithmetic is overall faster on x86 than 16 bit arithmetic.
This is all important because you should be able to use types that can hold the results of calculations your program performs regardless of platform. If a program produces correct results on one platform but incorrect results on another due to differences in integer overflow, that is bad. By using the standard integer types, you guarantee that the results on different platforms will be the same with regards to the size of integers used (of course there could be other differences between platforms besides integer width).
So yes, long
should be banned from modern C++ code. So should int
, short
, and long long
.
24
No, banning the builtin integer types would be absurd. They should not be abused either, however.
If you need an integer that is exactly N bits wide, use std::intN_t
(or std::uintN_t
if you need an unsigned
version). Thinking of int
as a 32 bit integer and long long
as a 64 bit integer is just wrong. It might happen to be like this on your current platforms but this is relying on implementation-defined behavior.
Using fixed-width integer types is also useful for inter-operating with other technologies. For example, if some parts of your application are written in Java and others in C++, you’ll probably want to match the integer types so you get consistent results. (Still be aware that overflow in Java has well-defined semantics while signed
overflow in C++ is undefined behavior so consistency is a high goal.) They will also be invaluable when exchanging data between different computing hosts.
If you don’t need exactly N bits, but just a type that is wide enough, consider using std::int_leastN_t
(optimized for space) or std::int_fastN_t
(optimized for speed). Again, both families have unsigned
counterparts, too.
So, when to use the builtin types? Well, since the standard does not specify their width precisely, use them when you don’t care about the actual bit width but about other characteristics.
A char
is the smallest integer that is addressable by the hardware. The language actually forces you to use it for aliasing arbitrary memory. It is also the only viable type for representing (narrow) character strings.
An int
will usually be the fastest type the machine can handle. It will be wide enough such that it can be loaded and stored with a single instruction (without having to mask or shift bits) and narrow enough so it can be operated on with (the most) efficient hardware instructions. Therefore, int
is a perfect choice for passing data and doing arithmetic when overflow is not a concern. For example, the default underlying type of enumerations is int
. Don’t change it to a 32 bit integer just because you can. Also, if you have a value that can only be –1, 0 and 1, an int
is a perfect choice, unless you’re going to store huge arrays of them in which case you might wish to use a more compact data type at the cost of having to pay a higher price for accessing individual elements. More efficient caching will likely pay off for these. Many operating system functions are also defined in terms of int
. It would be silly to convert their arguments and results back and forth. All this could possibly do is introduce overflow errors.
long
will usually be the widest type that can be handled with single machine instructions. This makes especially unsigned long
very attractive for dealing with raw data and all kinds of bit manipulation stuff. For example, I would expect to see unsigned long
in the implementation of a bit-vector. If the code is written carefully, it doesn’t matter how wide the type actually is (because the code will adapt automatically). On platforms where the native machine-word is 32 bit, having the backing array of the bit-vector be an array of unsigned
32 bit integers is most desirable because it would be silly to use a 64 bit type that has to be loaded via expensive instructions only to shift and mask the unneeded bits away again anyway. On the other hand, if the platform’s native word size is 64 bit, I want an array of that type because it means that operations like “find first set” may run up to twice as fast. So the “problem” of the long
data type that you’re describing, that its size varies from platform to platform, actually is a feature that can be put to good used. It only becomes a problem if you think about the builtin types as types of a certain bit width, which they simply ain’t.
char
, int
and long
are very useful types as described above. short
and long long
are not nearly as useful because their semantics are much less clear.
23
Another answer already elaborates on the cstdint types and lesser-known variations therein.
I’d like to add to that:
use domain-specific type names
That is, don’t declare your parameters and variables to be uint32_t
(certainly not long
!), but names such as channel_id_type
, room_count_type
etc.
about libraries
3rd party libraries that use long
or whatnot can be annoying, especially if used as references or pointers to those.
The best thing is to make wrappers.
What my strategy is, in general, is to make a set of cast-like functions that will be used. They are overloaded to accept only those types that exactly match the corresponding types, along with whatever pointer etc. variations you need. They are defined specific to the os/compiler/settings. This lets you remove warnings and yet ensure that only the “right” conversions are used.
channel_id_type cid_out;
...
SomeLibFoo (same_thing_really<int*>(&cid_out));
In particular, with different primitive types producing 32 bits, your choice of how int32_t
is defined might not match the library call (e.g. int vs long on Windows).
The cast-like function documents the clash, provides for compile-time checking on the result matching the function’s parameter, and removes any warning or error if and only if the actual type matches the real size involved. That is, it’s overloaded and defined if I pass in (on Windows) an int*
or a long*
and gives a compile-time error otherwise.
So, if the library is updated or someone changes what channel_id_type
is, this continues to be verified.
2