MH & nmh: Email for Users & Programmers

May, 2006

ASCII, bits, etc.

ASCII, the American Standard Code for Information Interchange, assigns a unique code number between 0 and 127 to each of 128 characters. Computers store these 128 characters as the 7-bit binary numbers 0000000 (0 decimal), 0000001, 0000010, and so on, up to 1111111 (127 decimal). For example, the character A, which is assigned the decimal number 65, is stored as the binary number 1000001, and the character a (97 decimal) is represented by 1100001.

Those 128 characters are enough for North American English, but they don't include characters that other languages need. To represent more than 128 characters, 8-bit binary numbers -- 00000000 (0 decimal) through 11111111 (255 decimal) -- are needed. But 8-bit numbers still don't provide enough "room" for all characters used in the world. There are plenty of schemes for dividing those 256 values among the character sets that different languages need, but there's no universal 8-bit character set. Many of the commonly-used 8-bit character sets are registered for use with MIME.

There are a variety of ways to handle text in Asian languages (which can have thousands of characters). A 16-bit character set called Unicode (more precisely, ISO 10646) is gaining acceptance for representing all the characters in the world.