Click for the Eclectic Content home page.

NCRs - numeric character references


[culture ]

The abbreviation "NCRs" in sites such as ashida's merely refer to the phrase "numeric character references". This phrase was used by the author(s) of the w3.org HTML 4.0 (and 4.1) definitions to mean [�] — where 00000 is a string of decimal numbers — and [쳌], where CCCC is a string of hexadecimal alphanumeric characters.

For example, the Unicode character [U+32E1 CIRCLED KATAKANA TU] can, in HTML, be represented as [㋡] (hexadecimal) or [㋡] (decimal), because 13,025 = 32E1 in hex.

The following quote is from the HTML 4.01 Specification:

"

5.3.1 - Numeric character references

Numeric character references specify the code position of a character in the document character set. Numeric character references may take two forms:

  • The syntax "&#D;", where D is a decimal number, refers to the ISO 10646 decimal character number D.
  • The syntax "&#xH;" or "&#XH;", where H is a hexadecimal number, refers to the ISO 10646 hexadecimal character number H. Hexadecimal numbers in numeric character references are case-insensitive.

Here are some examples of numeric character references:

  • å (in decimal) represents the letter "a" with a small circle above it [å] (used, for example, in Norwegian).
    • å (in hexadecimal) represents the same character. [Editor: i.e. hex E5 = 229 dec]
    • å (in hexadecimal) represents the same character as well. [Editor: i.e. it's case insensitive, either xe5 or xE5 will work. But there's no reason not to follow the convention, which is lowercase x, followed by uppercase letters in the hex string.]
  • И (in decimal) represents the Cyrillic capital letter "I". [ И ]
  • 水 (in hexadecimal) represents the Chinese character for water. [ ]

"

From the section "On SGML ..."

By default, SGML requires that all attribute values be delimited using either double quotation marks (ASCII decimal 34) or single quotation marks (ASCII decimal 39). Single quote marks can be included within the attribute value when the value is delimited by double quote marks, and vice versa. Authors may also use numeric character references to represent double quotes (") and single quotes ('). For double quotes authors can also use the character entity reference ".

Note that according to the HTML 4.1 specification "The character set defined in [ISO10646] is character-by-character equivalent to Unicode ([UNICODE])." (They are referring here to Unicode Version 3.)

Reference: http://www.w3.org/TR/html401/charset.html#h-5.3.1
Common misspellings: numberic character references

  

links:


Meta: Athor: Liberty Miller ; Published Feb. 16, 2007 | home | dict