Talk:Character encodings in HTML

This is an old revision of this page, as edited by Mjas (talk | contribs) at 18:17, 1 February 2007 (→External Link Suggestion). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 18:17, 1 February 2007 by Mjas (talk | contribs) (→External Link Suggestion)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

External Link Suggestion

Would it be a good idea to add an HTML character typer/generator (such as http://multiz.com/characters.php) to the external links section? This could be helpful to users unfamiliar with HTML. I thought I'd run it by everyone before posting it. Mjas 18:16, 1 February 2007 (UTC)

Previous Discussion

Is the character encoded by an HTML number the same character that is encoded by Unicode by the same number? For example, is character number 2343 in HTML the same as 2343 in Unicode? --Abdull 14:28, 19 August 2005 (UTC)

this was also posted at Talk:Unicode and HTML where it has been replied to, please post any further replies there. Plugwash 17:49, 19 August 2005 (UTC)

This page states that a numeric entity reference *always* refers to a unicode character code point. The w3c (http://www.w3.org/TR/html401/charset.html) states that a numeric entity reference is a code point in the document's character set. This appears to be a contradiction and this page appears to be wrong. If this page is in fact correct, then this page may want to explain why the document's character set is Unicode. — Preceding unsigned comment added by 71.141.135.56 (talk • contribs) 23:08 UTC, 8 January 2007 (UTC)

In HTML, the document character set is always the Universal Character Set: "HTML uses ... the Universal Character Set (UCS), defined in . ... The character set defined in is character-by-character equivalent to Unicode ().". However, many different encodings of the UCS can be used: UTF-8, UTF-16, ISO-8859-1, US-ASCII, SHIFT_JIS, and so on. Numeric character references always refer to the document character set, i.e., the UCS. The distinction between character set and character encoding is a bit tricky, so you're right, it could be explained better in the article. Indefatigable 21:39, 9 January 2007 (UTC)