Different Number Base Systems

I am doing a summer assignment for AP Computer Science. For this I am learning about different number base systems and how to convert them. These topics led to me wondering about why programmers use different number base systems. So I was wondering

Why do programmers use different number base systems? Why not use the familiar decimal system?


Computers use Binary Code.

Binary is used to encode both instructions and data. This is because transistors store and manipulate binary very nicely. It turns out that with enough boolean operations, you can perform the mathematical operations of addition, subtraction, multiplication, division, etc…

It is possible to use a base other than binary, but traditionally hasn’t been very successful. Still, some SSD transistor technologies store multiple values per cell (though usually a power of 2, still). Vacuum tubes did rather nicely supporting states of more than 2 values… 😉

Programmers often need to know which bits are set and which are clear in a multi-bit pattern (such as “bytes” or “words”). If you need to know that, base-2, base-8, and base-16 all all more directly useful than base-10. Binary is the easiest to see the individual bits, but becomes very long rather quickly. With 32 and 64-bit computers these days, hex (base-16) is preferred: each hex digit represents 4-bits, so a 64-bit value is 16 hex digits instead of 64 binary digits.

Octal (base-8) used to be used on some computers. The Digital Equipment Corporation PDP-8, for example, was a 12-bit computer. It lent itself nicely to base-8, so a 12-bit “word” was 4 octal digits. Octal still works but kind of fell by the wayside with standardization on 8-bit bytes. (3 hex digits would have worked as well, but the major opcode field of the instruction set was 3-bits, and back in those days programmers spent a lot of time looking at machine code.)

Note that just as programmers use a variety of number bases, still today, programmers use a wide variety of different

  • character encodings
  • file formats
  • floating point representations [Floating point] (https://en.wikipedia.org/wiki/Floating_point), Extended Precision
  • instruction encodings ([instruction set] (https://en.wikipedia.org/wiki/Instruction_set))
  • fonts

These are all ways of encoding (and later interpreting) binary information, not to mention how to share that information with humans.

Serialization is another relevant term; it is a way of transforming representations for various purposes, such as human readability, or network transmission.

It always boils down to address space. In a modern computer, any address space is a power of 2. E.g. you can fit 256 different values in a byte (2^8), 65536 in a 2-byte word (2^16). Directly related to that, memory sizes only come in powers of 2. Expressing these values as decimal does not “fit” the address space, meaning you do not run out of digits at the point the space is filled. Ten is not native to a computer, it is native to a human that uses his fingers to count.

If we look at the byte and we count until it is full, in 16-base, we would go from 00 to FF. We would never need more than 2 digits and it is obvious that 80 is halfway. It is also easy to convert to binary because you can map each 16-base digit to four 2-base digits (bits). So if you have to deal with bitmasks, 16-base would be much easier than 10-base.


Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *