C++ Primitive Built-in Types

Yao Yao on May 4, 2015
  • Published in category
  • C++

整理自 C++ Primer, 5th Edition

Some languages, such as Smalltalk and Python, check types at run time. In contrast, C++ is a statically typed language; type checking is done at compile time. As a consequence, the compiler must know the type of every name used in the program.

C++ allows programmers to define types that include operations as well as data. A major design goal of C++ is to let programmers define their own types that are as easy to use as the built-in types.

C++ primitive types that include:

  • the arithmetic types
  • a special type void.

Arithmetic Types

Type Meaning Minimum Size
bool boolean NA
char character 8 bits
wchar_t wide character 16 bits
char16_t Unicode character 16 bits
char32_t Unicode character 32 bits
short short integer 16 bits
int integer 16 bits
long long integer 32 bits
long long long integer 64 bits
float single-precision floating-point 6 significant digits
double double-precision floating-point 10 significant digits
long double extended-precision floating-point 10 significant digits

The size of—that is, the number of bits in—the arithmetic types varies across machines. The standard guarantees minimum sizes as listed in Table 2.1. However, compilers are allowed to use larger sizes for these types. Because the number of bits varies, the largest (or smallest) value that a type can represent also varies. 比如 int 在 32-bit 系统的 Visual Studio 2008 下就是 32 bits 长的。规范只规定了诸如 “int: Not smaller than short. At least 16 bits.” 这样的大规范,不同的机器、不同的编译器可以有不同的 type size。(源出:Variables and types

也有些类型是定长的,比如 __int32 一定是 32 bits,具体见 Data Type RangesWhat does the C++ standard state the size of int, long type to be?

A char is guaranteed to be big enough to hold numeric values corresponding to the characters in the machine’s basic character set. That is, a char is the same size as a single machine byte.

The smallest chunk of addressable memory is referred to as a “byte.” The basic unit of storage, usually a small number of bytes, is referred to as a “word.” On most machines a byte contains 8 bits and a word is either 32 or 64 bits, that is, 4 or 8 bytes.

A signed type represents negative or positive numbers (including zero); an unsigned type represents only values >= zero.

Unlike the other integer types, there are three distinct basic character types:

  • char
  • signed char
  • unsigned char

不要天真地以为一定是 (plain) char == signed char,因为有的系统是 (plain) char == signed char,有的系统是 (plain) char == unsigned char,这也是由编译器说了算的。

The standard does not define how signed types are represented, but does specify that the range should be evenly divided between positive and negative values. Hence, an 8-bit signed char is guaranteed to be able to hold values from –127 through 127; most modern machines use representations that allow values from –128 through 127.

The remaining integral types represent integer values of (potentially) different sizes. The language guarantees that an int will be at least as large as short, a long at least as large as an int, and long long at least as large as long. The type long long was introduced by the new standard.

Typically, floats are represented in one word (32 bits), doubles in two words (64 bits), and long doubles in either three or four words (96 or 128 bits). The type long double is often used as a way to accommodate special-purpose floating-point hardware; its precision is more likely to vary from one implementation to another.

Advice: Deciding which Type to Use

C++, like C, is designed to let programs get close to the hardware when necessary.

The arithmetic types are defined to cater to the peculiarities of various kinds of hardware. Accordingly, the number of arithmetic types in C++ can be bewildering. Most programmers can (and should) ignore these complexities by restricting the types they use. A few rules of thumb can be useful in deciding which type to use:

  • Use an unsigned type when you know that the values cannot be negative.
  • Use int for integer arithmetic. short is usually too small and, in practice, long often has the same size as int. If your data values are larger than the minimum guaranteed size of an int, then use long long.
  • Do not use plain char or bool in arithmetic expressions. Use them only to hold characters or truth values. Computations using char are especially problematic because char is signed on some machines and unsigned on others. If you need a tiny integer, explicitly specify either signed char or unsigned char.
  • Use double for floating-point computations; float usually does not have enough precision, and the cost of double-precision calculations versus single-precision is negligible. In fact, on some machines, double-precision operations are faster than single. The precision offered by long double usually is unnecessary and often entails considerable run-time cost.

Type Conversions

bool b = 42; 			// b is true
int i = b; 				// i has value 1
i = 3.14; 				// i has value 3
double pi = i; 			// pi has value 3.0
unsigned char c = -1; 	// assuming 8-bit chars, c has value 255
signed char c2 = 256; 	// assuming 8-bit chars, the value of c2 is undefined

int i = 42;
if (i) 					// condition will evaluate as true
	i = 0;
  • When we assign one of the non-bool arithmetic types to a bool object, the result is false if the value is 0 and true otherwise.
  • When we assign a bool to one of the other arithmetic types, the resulting value is 1 if the bool is true and 0 if the bool is false.
  • When we assign a floating-point value to an object of integral type, the value is truncated. The value that is stored is the part before the decimal point.
  • When we assign an integral value to an object of floating-point type, the fractional part is zero. Precision may be lost if the integer has more bits than the floating-point object can accommodate.
  • If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold. For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside this range, the compiler assigns the remainder of that value modulo 256. Therefore, assigning –1 to an 8-bit unsigned char gives that object the value 255.
  • If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.

Advice: Avoid Undefined and Implementation-Defined Behavior

Undefined behavior results from errors that the compiler is not required (and sometimes is not able) to detect. Even if the code compiles, a program that executes an undefined expression is in error.

Unfortunately, programs that contain undefined behavior can appear to execute correctly in some circumstances and/or on some compilers. There is no guarantee that the same program, compiled under a different compiler or even a subsequent release of the same compiler, will continue to run correctly. Nor is there any guarantee that what works with one set of inputs will work with another.

Similarly, programs usually should avoid implementation-defined behavior, such as assuming that the size of an int is a fixed and known value. Such programs are said to be nonportable. When the program is moved to another machine, code that relied on implementation-defined behavior may fail. Tracking down these sorts of problems in previously working programs is, mildly put, unpleasant.

Caution: Don’t Mix Signed and Unsigned Types

If we use both unsigned and int values in an arithmetic expression, the int value ordinarily is converted to unsigned.

unsigned u = 10;
int i = -42;
std::cout << i + i << std::endl; // prints -84
std::cout << u + i << std::endl; // if 32-bit ints, prints 4294967264

unsigned u1 = 42, u2 = 10;
std::cout << u1 - u2 << std::endl; // ok: result is 32
std::cout << u2 - u1 << std::endl; // ok: but the result will wrap

// WRONG: u can never be less than 0; the condition will always succeed
for (unsigned u = 10; u >= 0; --u)
	std::cout << u << std::endl;

Expressions that mix signed and unsigned values can yield surprising results when the signed value is negative. It is essential to remember that signed values are automatically converted to unsigned. For example, in an expression like a * b, if a is -1 and b is 1, then if both a and b are ints, the value is, as expected -1. However, if a is int and b is an unsigned, then the value of this expression depends on how many bits an int has on the particular machine. On our machine, this expression yields 4294967295.

blog comments powered by Disqus