eclectic content

Code Pages and "Special Characters" (Microsoft Windows)


[microsoft software, sortcuts / key combos]
[ ]

Topics:
  • Inserting Special Characters (Microsoft Windows)
  • What the heck is a "code page", and how will I know what code to use?

Code Pages

The easiest way to explain what a "code page" is is to show an example. Windows (by default*) uses "code page 437".

*[from the built-in help on Win XP:] "For example, if your system language is English (US), the code page is 437 (MS-DOS Latin US)..."
[Reference: "Help and Support Center", article titled "To input characters that are not on your keyboard"]

  • click here to view 'Code Page 437 MS-DOS Latin US' as a GIF image.
    [source]

  • Note: The Rows and Columns in the table are used to indicate the hexadecimal equivalent. For example, the cell at the position Column 9 (90 or 9-), Row D (0D or -D) is the Yen Symbol. The number for that position is 157. The equivalent number in hexadecimal is '9D' (a common notation is '0x9D').
    • In other words, hex code number "9D" = decimal code number "157"
    • Here's another example:
      A3 (hexadecimal) = 163 (decimal)

A code page is also called an "OEM CODE PAGE" (OEM is short for 'original equiptment manufacturer' ; it's supposed to be a reference to the system 'as it was at the factory', and the fact that a default font set was part of the pre-operating system software/firmware in early IBM PCs).

Here is the usage context (using the 'Alt' key with the numeric keypad to enter 'special characters'):

    When you use the [ALT] key to enter a 'special character', if the first digit you type is any number from 1 through 9, the value is recognized as a code point in the system's 'OEM code page'.

The default for Windows is United States English, defined in 'code page 437' (MS-DOS Latin US). So pressing ALT and then typing 163 on the numeric keypad produces ú (Latin lowercase letter U with acute, equivalent to unicode character "U+00FA").

Note: The result will differ depending on the Windows system language [specified in [Control Panel] > [Regional and Language Options]). For example, if your system language is Greek, the system uses "OEM code page 737 : MS-DOS Greek", and the sequence use above would produce, instead of 'lowercase U with acute', the Greek lowercase letter 'MU' (equivalent to Unicode character U+03BC).

[text above adapted from ms-its:C:\WINDOWS\Help\lang.chm::/lang_char_code_input.htm ]


"OEM 437" (from Microsoft) and Equivalent Unicode Codes:

The table reproduced below is from here ; last modified April 22, 2005 ...

Note: the hexadecimal numbers given in this table are a red-herring! The hex numbers do NOT indicate the hex values of the table.

The hex numbers presented in the following table are merely supposed to indicate the equivalent Unicode character. Unicode characters are indicated with hex numbers in the format "Ux00##".

  • For example " 9D (0x009d) = Yen Sign ", and the Yen Sign is represented in Unicode as U+00A5 ... BUT these two are not necessarily exactly equivalent (stating 0x009D = U+00A5 is incorrect).

This table is meant to convey the following:
the character at position XY in Code Page "OEM 437" is equivalent to the Unicode character respresented by the code Ux00##

  • some of the hex codes match, but after 128 (hex 80), they diverge.
  • Do NOT use the decimal equivalent of the Unicode hex value.
    • example: Yen sign in code page 437 is "157" (hex:9D). Do NOT use the decimal equivalent to the Unicode number "A5" (which is decimal "165")
  • In other words:
    • Alt+1,5,7 is what works in MS Word,: ¥
    • but for HTML NCRs, you have to use &#165: ¥]

Point at a character and pause to see additional information (in a "mouse-over" popup).

<437 position (hex)> = position, decimal (is equivalent to <Unicode character U+____> )

0x00 = 0 = NULL (equivalent to U+0000)
0x01 = 1 = START OF HEADING (equivalent to U+0001)
0x02 = 2 = START OF TEXT (equivalent to U+0002)
...
0x20 = 16 = SPACE (equivalent to U+0020)
0x21 = 17 = EXCLAMATION MARK (equivalent to U+0021)
0x22 = 18 = QUOTATION MARK (equivalent to U+0022)
...
80 = 128 = LATIN CAPITAL LETTER C WITH CEDILLA (equivalent to U+00C7)
81 = 129 = LATIN SMALL LETTER U WITH DIAERESIS (equivalent to U+00FC)
82 = 130 = LATIN SMALL LETTER E WITH ACUTE (equivalent to U+00E9 )
83 = 131 = LATIN SMALL LETTER A WITH CIRCUMFLEX (equivalent to U+00E2)
etc.

 

See Also:

 

Misc.

Tip: Insert Date & Time in MS Word using [Alt], [I], [T]

Entering the sequence of keys [Alt], then [I], then [T] executes a command from MicroSoft Word's command bar. To make a certain Time/Date format the default (so that you only have to hit the [Enter] key after the sequence above), select the format you want from the list, then click the [Default…] button.

 

Test (Japanese in UTF-8) :

2008年10月27日

test 2: Ux2248 ͢     &#8776; →  

  

  
Resources/ Links / Sources:

. http://content.answers.com/main/content/img/CDE/_HEXCHRT.GIF
. http://www.microsoft.com/globaldev/reference/oem/437.mspx
. More code pages from Microsoft: http://msdn.microsoft.com/en-us/library/cc195051.aspx

related:
. the Unicode pages have instructions for inserting each character, but they often have the code wrong (unfortunately):
http://www.fileformat.info/info/unicode/char/2248/index.htm

^ HOME

last updated: 2008-Oct-31