Character Codes

The character set defined for this page is http-equiv="Content-Type" content="text/html; charset=UTF-8"

See "LINKS/ OTHER RESOURCES" below for complete character lists.

Selected character reference:
  
ISO 8859-1
   
  Symbol /
Rendering
Description In HTML, type this: ...or this:    
     no-break space/
non-breaking space
       
¢ ¢ Cent sign ¢
¢
   
£ £ Pound sterling £      
¤ ¤ General currency sign ¤      
¥ ¥ Yen sign ¥ ¥
   
¦ ¦ Broken vertical bar        

Note that all ISO-8859-1 characters are included in ANSI / Windows-1252.

Use the following header tag in HTML documents to specify that the document was saved/encoded
using the Microsoft Windows character set:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

The 27 extra characters provided by Windows 1252 (ANSI)
that are not available in ISO-8859-1 include:

  • &euro [€] (Euro sign)
  • &sbquo [‚] (sb quote)
  • &fnof [ƒ] ('function of ...' [f(x)] or 'finite part integral')
  • &bdquo [„]
  • &bull; [•]
  • &hellip; […] (horizontal ellipsis,ellipses) "unbreakable ellipsis" … [not to be confused with "ellipses", the flattened circle shape.]
UNICODE
  Symbol /
Rendering
Description In HTML, type this: ...or this:
(HTML Entity, decimal)
...or this:
(HTML Entity, hex)
Official
&bull; bullet &bull;
&#8226;
 
&ne; (value) ... does not equal ... (value) &ne; &#8800; &#x2260; NOT EQUAL TO
(U+2260)
             
&asymp; ... equals approximately...
... is almost equal to ...
&asymp;
(from "asymptote")
&#8773; view more like this  
"approximately equal to": alternatives to the 'wiggily equals' include:
{ ~ } or { = (approx) }

Q: Why 'asymptote'?
A: Because, presumedly, the ~ symbol was often used to indicate that one function is asymptotic to another, and the ~ and ≈ symbols were often used interchangeably.
One might, for example, write f(x)~ g(x) if the ratio of f(x) and g(x) approach 1 as x -> infinity.
In other words, two values which ALMOST meet or are ALMOST the same...
[s]
&there4; (something-something)
... therefore ... (something)
       
&frasl;

solidus
("Fraction Slash")


Compare to "forward slash" (&#47;):
/

HTML Entity (decimal) &#8260;

HTML Entity (hex) &#x2044;


'FRACTION SLASH' (U+2044)

How to type in Microsoft Windows Alt +2044
 
&#x2044; solidus
("Fraction Slash")
HTML Entity (named) &frasl;   see also: "division slash";
"solidus"
 
  F8; %29F8; BIG SOLIDUS 29F8
       


(see also this 1998 proposal and the then-current reference tables)

"therefore" symbol &there4; none ∴ &there4;

"varies with / similar to" &sim; none ∼ &sim;

"almost equal to" &asymp; none ≈ &asymp;

"not equal to" &ne; none ≠ &ne;

"equivalent to" &equiv; none ≡ &equiv;

"less-than or equal to" &le none ≤ &le

"greater-than/equal to" &ge; none ≥ &ge;

Character Encoding Options
  Common Name Equivalent to To use for HTML page authoring
ISO-8859-1
Western (Latin 1)
(Western Alphabet,
latin1, us-ascii, windows-1252, x-ascii)
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
UTF-8 Unicode <meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
ISO 10646 Unicode    
Japanese (Shift JIS) <meta http-equiv="Content-Type" content="text/html; charset=shift_jis">
Windows-1252 Western European (Windows) ANSI* (windows-1252)
      <meta http-equiv="Content-Type" content="text/html; charset=gb2312">

Unicode compliant applications (UTF8) :
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>

Unix Applications (EUC), Japanese :
<meta http-equiv='Content-Type' content='text/html; charset=euc-jp'>

Mac & perhaps many PC applications (Shift_JIS), Japanese :
<meta http-equiv='Content-Type' content='text/html; charset=shift_jis'>

Other environments especially those created in Japan (JIS), Japanese :
<meta http-equiv='Content-Type' content='text/html;charset=iso-2022-jp'>

[s]

Note that you must actually save the file (.txt, .html) using the proper encoding; the HTML "Content-Type" tag will not change
the encoding of a .html file (for example) saved as ANSI* to (for example) a document with Unicode encoding.
But you can use the decimal numbers (e.g. in an ascii document and specify the UTF-8 encoding (see STAR example, below).

*ANSI character set / ANSI encoding — The ANSI-standard character set defines 256 characters. The first 128 are ASCII, and the second 128 contain math and foreign language symbols, which are different than those on the PC.

Dumb tricks with Unicode:
Putting a star in your web page title:
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>&#9733; I'm a star baby!&#9733;</title>

Result (if page uses UTF-8/Unicode)
★I'm a star baby!★


Codes in URLs: (URIs)
%E5%A4%A7%E9%98%AA%E5%B8%82 %3F %3D %3C %3F

symbol for HTML for CGI for URLs    
    %2526      
' &#39; %2527 %27 decimal 39 example:
(Opera url) http://www.example.com/Elune's_Grace
is equivalent to
(IE url) http://www.example.com/Elune%27s_Grace
= &#61; %253D
 
decimal 61 hex (byte) 3D
hex indicator
for CGI: %3D
Note: Hex  byte 25 → Dec 37,   &#37; = %


Codes to be passed to CGI sripts in URIs
sample:

 to send '&' or ' = ' as data to the CGI you must encode them as %2526 and %253D, respectively. 
 Note that when = is used between a field name and its value 
 it does not need to be encoded, and when & is used as the first character of a character entity 
 (such as &lt; ) it does not need to be encoded. 
 Also note that field/value pairs are delimited by &amp; (the character entity for &).
 For example:
 <form method="post" action="ROFM.acgi?_action=Add&amp;...&amp;Subject=OneAmpersandInQuotes&quot;%2526&quot;"> 

http://ascii.cl/htmlcodes.htm

HTML Character Set

selected examples, based on [ISO-8859-1]

                                                                                                    *ISO/IEC 8859-10 code (in hex as 0xXX)
REFERENCE SYMBOL
("GLYPH")
DESCRIPTION windows-1252 ENTITY ASCII UNICODE
(in hex as 0xXXXX)
Dec Hex *
&#13; (carriage return)            
&#32; (space)            
&#33; ! Exclamation Mark !   33 21  
&#34; " Quotation Mark " &quot; 34 22 [0x22] 0x0022
&#35; # Number Sign,
Octothorp, "pound"
#   35 23  
&#36; $ Dollar Sign, USD $   36 24  
&#37; % . %        
&#38; & Ampersand & &amp;      
&#39; '   apostrophe '        
&#40; (   (        
&#41; ) _ )        
&#42; * _ *        
&#43; + _ +        
&#44; , _ ,        
&#45; -            
&#46;   .              
&#92; \ back-slash or  backslash \        
&#47; / forward-slash or  slash
Note: "slash" does not equal "solidus"
(fraction [&frasl;] or division [&#8725;] slash)
i.e.
[ / ≠ ⁄ ] & [ / ≠ ∕ ]
&#161; ¡ Inverted Exlamation Mark &iexcl;      
&#172; ¬ Not sign  ¬        
&#231; ç cedilla ç        
&#233; é Small e, acute accent é        
    a          
REFERENCE DESCRIPTION
     _       _     
     
&#173;
­, Soft hyphen
 
&#218;
Ú, Capital U, acute accent
.  
&#174;
®, Registered trademark
 
&#219;
Û, Capital U, circumflex accent
&#47; /  
&#175;
¯, Macron accent
 
&#220;
Ü, Capital U, dieresis or umlaut mark
&#58; :  
&#176; 
°, Degree sign
 
&#221;
Ý, Capital Y, acute accent
&#59; ;  
&#177;
±, Plus or minus
 
&#222;
Þ, Capital THORN, Icelandic
&#60; <  
&#178;
², Superscript two
 
&#223;
ß, Small sharp s, German (sz ligature)
&#61; =  
&#179;
³, Superscript three
  &#224; à, Small a, grave accent
&#62; >  
&#180;
´, Acute accent
  &#225; á, Small a, acute accent
&#63; ?  
&#181;
µ, Micro sign
  &#226; â, Small a, circumflex accent
&#64; @  
&#182;
¶, Paragraph sign
  &#227; ã, Small a, tilde
&#91;   [  
&#183;
·, Middle dot
  &#228; ä, Small a, dieresis or umlaut mark
&#92; \  
&#184;
¸, Cedilla
  &#229; å, Small a, ring
&#93;   ]   &#185;   ¹, Superscript one     &#230; æ, Small ae dipthong (ligature)
&#94; ^   &#186; º, Masculine ordinal   &#231; ç, Small c, cedilla
&#95; _   &#187; », Right angle quote, guillemotright   &#232; è, Small e, grave accent
&#96; `   &#188; ¼, Fraction one-fourth   &#233; é, Small e, acute accent
&#123; {   &#189; ½, Fraction one-half   &#234; ê, Small e, circumflex accent
&#124; |   &#190; ¾, Fraction three-fourths   &#235; ë, Small e, dieresis or umlaut mark
&#125; }   &#191; ¿, Inverted question mark   &#236; ì, Small i, grave accent
&#126; ~   &#192; À, Capital A, grave accent   &#237; í, Small i, acute accent
&#160; (non-breaking space)   &#193; Á, Capital A, acute accent   &#238; î, Small i, circumflex accent
&#161; ¡, Inverted exclamation   &#194; Â, Capital A, circumflex accent   &#239; ï, Small i, dieresis or umlaut mark
&#162; ¢, Cent sign   &#195; Ã, Capital A, tilde   &#240; ð, Small eth, Icelandic
&#163; £, Pound sterling   &#196; Ä, Capital A, dieresis or umlaut mark   &#241; ñ, Small n, tilde
&#164; ¤, General currency sign   &#197; Å, Capital A, ring   &#242; ò, Small o, grave accent
&#165; ¥, Yen sign   &#198; Æ, Capital AE dipthong (ligature)   &#243; ó, Small o, acute accent
&#166; ¦, Broken vertical bar   &#199; Ç, Capital C, cedilla   &#244; ô, Small o, circumflex accent
&#167; §, Section sign   &#200; È, Capital E, grave accent   &#245; õ, Small o, tilde
&#168; ¨, Umlaut (dieresis)   &#201; É, Capital E, acute accent   &#246; ö, Small o, dieresis or umlaut mark
&#169; ©, Copyright   &#202; Ê, Capital E, circumflex accent   &#247; ÷, Division sign
&#169; ©, Copyright   &#203; Ë, Capital E, dieresis or umlaut mark   &#248; ø, Small o, slash
      &#204; Ì, Capital I, grave accent   &#249; ù, Small u, grave accent
      &#205; Í, Capital I, acute accent   &#250; ú, Small u, acute accent
      &#206; Î, Capital I, circumflex accent   &#251; û, Small u, circumflex accent
      &#207; Ï, Capital I, dieresis or umlaut mark   &#252; ü, Small u, dieresis or umlaut mark
      &#208; Ð, Capital Eth, Icelandic   &#253; ý, Small y, acute accent
      &#209; Ñ, Capital N, tilde   &#254; þ, Small thorn, Icelandic
      &#210; Ò, Capital O, grave accent   &#255; ÿ, Small y, dieresis or umlaut mark
      &#211; Ó, Capital O, acute accent   &#256; A
      &#212; Ô, Capital O, circumflex accent   &#257 a
      &#213; Õ, Capital O, tilde   &#258; A
      &#214; Ö, Capital O, dieresis or umlaut mark   &#259 a
      &#215; ×, Multiply sign   &260; A
      &#215; ×, Multiply sign   &261; a
            &261; a


* hexadecimal numbers: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F {conversion}

Other Unicode (Experimental\Extended):

REFERENCE SYMBOL DESCRIPTION      
&#8195;
&emsp;
e m
e m
em space (at left between e and m) HTML4 missing in 8859-1  
&#8196; e m three-per-em space HTML4 missing in 8859-1  
&ensp; e n en space HTML3 missing in 8859-1  
&#1780; ۴ Extended Arabic - Indic Digit Four UTF-8
(1780 in hexadecimal:
06F4 )
 
#%028D; &invw; ʍ #028D; ʍ      
&#91;
[
91 0x5B (U+005B) left square bracket Basic Latin
           

 


Browser / Encoding Render Check:
The following table is used to see if the character code
INDICATED IN THE FIRST COLUMN renders correctly in the second column.
For example, you should see the character for em dash in the center column in the first
two rows.
(Note: The character set specified for this html page is " http-equiv="Content-Type" content="text/html; charset=UTF-8" .)

&#151;   —   em dash
&#8212;   —   em dash —
&#8211;   –   en dash
&ndash;   –   en dash
&mdash;   —   em dash
&#150;   –    

Other links:




LINKS/ OTHER RESOURCES:

ASCII - ISO 8859-1 (Latin-1) Table with HTML Entity Names
Martin Ramsch's "ISO 8859-1 Table" [http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html]
http://www.unicode.org/unicode.css
Character Encoding Options

ASCII-EBCDIC Character Set
The HTML Document Character Set

About the use of the em dash and the en dash (and similar characters)

click here for full Data on languages

[new!] HTML Syntax (with browser support list)


Typing Characters in widows.

On Windows systems, you can (usually - some application programs may override this) produce any character in the Windows character set (naturally, in its Windows encoding) as follows: Press down the (left) Alt key and keep it down. Then type, using the separate numeric keypad (not the numbers above the letter keys!), the four-digit code of the character in decimal. Finally release the Alt key. Notice that the first digit is always 0, since the code values are in the range 32 - 255 (decimal). For instance, to produce the letter "Ä" (which has code 196 in decimal), you would press Alt down, type 0196 and then release Alt. Upon releasing Alt, the character should appear on the screen. In MS Word, the method works only if Num Lock is set. This method is often referred to as Alt-0nnn. (If you omit the leading zero, i.e. use Alt-nnn, the effect is different, since that way you insert the character in code position nnn in the DOS character code! For example, Alt-196 would probably insert a graphic character which looks somewhat like a hyphen. There are variations in the behavior of various Windows programs in this area, and using those DOS codes is best avoided.) [s]

Example, Windows XP.
Using the Code to type the letter.

Open MS word or Notepad.
Click to position your cursor in the text
area of the application (e.g. white box area of Notepad).

Press and hold the ALT KEY,
continue to hold the ALT key as you type "0","2","2","5",
then release the ALT key.

The following character should appear on the screen: á

As shown in the table {http://www.obkb.com/dcljr/chars.html},
'small a, acute ' uses the code
'&#225;' in HTML. So the HTML decimal code
is the same as the decimal code used for entering special
characters in Windows; the only difference being
that the Windows code is delimited by the ALT key
(instead of "&","#", and ";"), and the Windows code
uses the leading zero (i.e. "0225" instead of just "225").

Reference:
http://www.obkb.com/dcljr/chars.html



http://www.unicode.org/unicode/uni2book/ch06.pdf

Charsets:

The official registry of "charset" (i.e., character encoding) names, with references to documents defining their meanings, is kept by IANA at
http://www.iana.org/assignments/character-sets
(According to the documentation of the registration procedure, RFC 2978, it should be elsewhere, but it has been moved.) I have composed a tabular presentation of the registry, ordered alphabetically by "charset" name and accompanied with some hypertext references.

keywords: Exclimation Mark
Document published: Jan. 09, 2000
Last modified or updated: Nov. 14 2005
By lyberty

INDEX (encyc home) | ROOT (site home)