Misplaced Pages

Talk:Character encodings in HTML: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 09:12, 24 March 2010 editIncnis Mrsi (talk | contribs)Extended confirmed users, Pending changes reviewers, Rollbackers11,646 edits “W3C vs HTTP” referenced info was stealthy removed← Previous edit Revision as of 12:02, 12 October 2010 edit undoSmackBot (talk | contribs)3,734,324 editsm Previous Discussion: Subst: {{unsigned}} (& regularise templates)Next edit →
Line 13: Line 13:




This page states that a numeric entity reference *always* refers to a unicode character code point. The w3c (http://www.w3.org/TR/html401/charset.html) states that a numeric entity reference is a code point in the document's character set. This appears to be a contradiction and this page appears to be wrong. If this page is in fact correct, then this page may want to explain why the document's character set is Unicode. {{unsigned|71.141.135.56|23:08 UTC, 8 January 2007}} This page states that a numeric entity reference *always* refers to a unicode character code point. The w3c (http://www.w3.org/TR/html401/charset.html) states that a numeric entity reference is a code point in the document's character set. This appears to be a contradiction and this page appears to be wrong. If this page is in fact correct, then this page may want to explain why the document's character set is Unicode. <small><span class="autosigned">—Preceding ] comment added by ] (] • ]) 23:08 UTC, 8 January 2007</span></small><!-- Template:Unsigned -->


: In HTML, the ''document character set'' is ''always'' the ]: "HTML uses ... the Universal Character Set (UCS), defined in . ... The character set defined in is character-by-character equivalent to Unicode ().". However, many different ''encodings'' of the UCS can be used: ], ], ], ], ], and so on. Numeric character references always refer to the document character set, i.e., the UCS. The distinction between character set and character encoding is a bit tricky, so you're right, it could be explained better in the article. ] 21:39, 9 January 2007 (UTC) : In HTML, the ''document character set'' is ''always'' the ]: "HTML uses ... the Universal Character Set (UCS), defined in . ... The character set defined in is character-by-character equivalent to Unicode ().". However, many different ''encodings'' of the UCS can be used: ], ], ], ], ], and so on. Numeric character references always refer to the document character set, i.e., the UCS. The distinction between character set and character encoding is a bit tricky, so you're right, it could be explained better in the article. ] 21:39, 9 January 2007 (UTC)

Revision as of 12:02, 12 October 2010

External Link Suggestion

Would it be a good idea to add an HTML character typer/generator (such as http://multiz.com/characters.php) to the external links section? This could be helpful to users unfamiliar with HTML. I thought I'd run it by everyone before posting it. Mjas 18:16, 1 February 2007 (UTC)

I began using Multiz.com's special character generator about a year and a half ago. It is the best and easiest Greek character typer I have found. As one who conducts a lot of research, and often needs special characters, I believe I can recommend this. Finding this type of Greek character typer though searches like Google is near impossible--after a lot of fruitless searches, I can confirm this. Anyway, this is a good link for Misplaced Pages to have. HistoryThD 15:18, 6 February 2007 (UTC)

I believe there is one already:

It's been there for over a year when I posted it.

Previous Discussion

Is the character encoded by an HTML number the same character that is encoded by Unicode by the same number? For example, is character number 2343 in HTML the same as 2343 in Unicode? --Abdull 14:28, 19 August 2005 (UTC)

this was also posted at Talk:Unicode and HTML where it has been replied to, please post any further replies there. Plugwash 17:49, 19 August 2005 (UTC)


This page states that a numeric entity reference *always* refers to a unicode character code point. The w3c (http://www.w3.org/TR/html401/charset.html) states that a numeric entity reference is a code point in the document's character set. This appears to be a contradiction and this page appears to be wrong. If this page is in fact correct, then this page may want to explain why the document's character set is Unicode. —Preceding unsigned comment added by 71.141.135.56 (talkcontribs) 23:08 UTC, 8 January 2007

In HTML, the document character set is always the Universal Character Set: "HTML uses ... the Universal Character Set (UCS), defined in . ... The character set defined in is character-by-character equivalent to Unicode ().". However, many different encodings of the UCS can be used: UTF-8, UTF-16, ISO-8859-1, US-ASCII, SHIFT_JIS, and so on. Numeric character references always refer to the document character set, i.e., the UCS. The distinction between character set and character encoding is a bit tricky, so you're right, it could be explained better in the article. Indefatigable 21:39, 9 January 2007 (UTC)

“W3C vs HTTP” referenced info was stealthy removed

Let us discuss an edit of user Ms2ger. Because he forged the m label (for which I put him a formal warning), this controversial edit attracted no attention. But a crucially important reference to the W3C, which prove its disappointment in HTTP/1.1 charset detection, was removed without any compensation. Should we restore that piece of text, or let us write all article from scratch for the third time? Incnis Mrsi (talk) 11:37, 22 March 2010 (UTC)

No response in reasonable time – an edit partially reverted, I restored all voluntary removed information. Please, do not remove unless discussed here (for each paragraph in question), or use {{fact}} tag for statements which appear poorly referenced. Incnis Mrsi (talk) 09:12, 24 March 2010 (UTC)