Revision as of 12:45, 12 October 2016 editKendall-K1 (talk | contribs)Extended confirmed users23,785 edits see article talk page← Previous edit |
Latest revision as of 15:59, 25 November 2018 edit undo174.254.130.36 (talk) Now redirects to the specific section for this RFC.Tag: Redirect target changed |
(20 intermediate revisions by 12 users not shown) |
Line 1: |
Line 1: |
|
|
#REDIRECT ] |
|
{{original research|date=February 2013}} |
|
|
{{Refimprove|date=July 2010}} |
|
|
'''UTF-9''' and '''UTF-18''' (9- and 18-] ], respectively) were two ] joke specifications for encoding Unicode on systems where the ] (nine bit group) is a better fit for the native word size than the ], such as the 36-bit ] and the ]. Both encodings were specified in RFC 4042, written by ] (inventor of ]) and released on April 1, 2005. The encodings suffer from a number of flaws and it is confirmed by their author that they were intended as a joke.<ref>{{cite web|url=http://panda.com/mrc/|title=Mark Crispin's Web Page|accessdate=2006-09-17}} Points out ] for two of his RFCs.</ref> |
|
|
|
|
|
|
|
{{Rcat shell| |
|
However, unlike some of the "specifications" given in other April 1 ], they are actually technically possible to implement, and have in fact been implemented in ] assembly language. They are however not endorsed by the ]. |
|
|
|
{{R to related topic}} |
|
|
|
|
|
{{Rwh}} |
|
==Technical details== |
|
|
|
}} |
|
Like the 8-bit code commonly called ], UTF-9 uses a system of putting an octet in the low 8 ]s of each nonet and using the high bit to indicate continuation. This means that ] and ] characters take one nonet each, the rest of the ] characters take two nonets each and non-BMP code points take three. Code points that require multiple nonets are stored starting with the most significant non-zero nonet. |
|
|
|
|
|
UTF-18 is a fixed length encoding using an 18 bit integer per code point. This allows representation of 4 planes, which are mapped to the 4 planes currently used by ] (planes 0–2 and 14). This means that the two private use planes (15 and 16) and the currently unused planes (3–13) are not supported. The UTF-18 specification does not say why they did not allow surrogates to be used for these code points, though when talking about UTF-16 earlier in the RFC, it says "This transformation format requires complex surrogates to represent code points outside the BMP". After complaining about their complexity, it would have looked a bit hypocritical to use surrogates in their new standard. It is unlikely that planes 3–13 will be assigned by ] any time in the foreseeable future. Thus, UTF-18, like ] and ], guarantees a fixed width for all code points (although not for all glyphs). |
|
|
|
|
|
== See also == |
|
|
* ] |
|
|
* ] |
|
|
* ] |
|
|
|
|
|
==Notes== |
|
|
{{Reflist}} |
|
|
|
|
|
==External links== |
|
|
* RFC 4042: UTF-9 and UTF-18 Efficient Transformation Formats of Unicode |
|
|
|
|
|
{{character encoding}} |
|
|
{{IETF RFC 1st april}} |
|
|
|
|
|
|
{{DEFAULTSORT:Utf-09 And Utf-18}} |
|
{{DEFAULTSORT:Utf-09 And Utf-18}} |
|
] |
|
] |
|
] |
|
|
] |
|
|
] |
|
] |