Code Page Overview

Code Page Overview
One of the challenges in supporting multiple languages in computer software is the amazing array of phrases and mechanisms used to describe something that seems so simple: When I press a key on the keyboard, how does the computer know which character to draw on the screen?

To draw text on the screen, the computer uses a character set. A character set consists of a font, the set of symbols you see on the screen or printed page, and a character encoding, which assigns a numerical value to each letter or punctuation mark in the language.

To a user trying to communicate data between two computers, this can have either no effect at all, or dire consequences. If both computers agree on the character set no difficulties are encountered. If one computer is configured to work in Français and the other in US English, problems occur: the standard US ASCII character set defines only 95 printable characters and does not include the ç character required to display the word Français!

The MS-DOS (and PC-DOS) operating systems assign several different character sets for customers with different language needs. These character sets, which are supported by PC hardware in the keyboard and video display, are known as code pages. Each code page supports 256 characters. Many similarities exist between the various code pages, but no two are identical. In some cases, the PC code page commonly used in a country does not agree with that country's national standard character set.

In many cases, the different national and industry standards for character encoding overload the meaning of characters: the same character represents a different letter in the different encodings. A simple example of this is the 35th character in US ASCII, the # character. In the United Kingdom, this character is £, which is not represented in US ASCII.

For more information on code pages, also see:

For more information on character sets, also see:

Character Sets Reference