This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources. Find sources: "Cork encoding" – news · newspapers · books · scholar · JSTOR (November 2012) |
The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts. It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX. It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.
Details
In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used. In LaTeX one can switch to this encoding with \usepackage{fontenc}
, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.
Character set
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | A | B | C | D | E | F | |
0x | ` 0060 |
´ 00B4 |
ˆ 02C6 |
˜ 02DC |
¨ 00A8 |
˝ 02DD |
˚ 02DA |
ˇ 02C7 |
˘ 02D8 |
¯ 00AF |
˙ 02D9 |
¸ 00B8 |
˛ 02DB |
‚ 201A |
‹ 2039 |
› 203A |
1x | “ 201C |
” 201D |
„ 201E |
« 00AB |
» 00BB |
– 2013 |
— 2014 |
ZWSP 200B |
₀ 2080 |
ı 0131 |
ȷ 0237 |
ff FB00 |
fi FB01 |
fl FB02 |
ffi FB03 |
ffl FB04 |
2x | SP | ! | " | # | $ | % | & | ’ 2019 |
( | ) | * | + | , | - | . | / |
3x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? |
4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | _ | |
6x | ‘ 2018 |
a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | SHY |
8x | Ă 0102 |
Ą 0104 |
Ć 0106 |
Č 010C |
Ď 010E |
Ě 011A |
Ę 0118 |
Ğ 011E |
Ĺ 0139 |
Ľ 013D |
Ł 0141 |
Ń 0143 |
Ň 0147 |
Ŋ 014A |
Ő 0150 |
Ŕ 0154 |
9x | Ř 0158 |
Ś 015A |
Š 0160 |
Ş 015E |
Ť 0164 |
Ţ 0162 |
Ű 0170 |
Ů 016E |
Ÿ 0178 |
Ź 0179 |
Ž 017D |
Ż 017B |
IJ 0132 |
İ 0130 |
đ 0111 |
§ 00A7 |
Ax | ă 0103 |
ą 0105 |
ć 0107 |
č 010D |
ď 010F |
ě 011B |
ę 0119 |
ğ 011F |
ĺ 013A |
ľ 013E |
ł 0142 |
ń 0144 |
ň 0148 |
ŋ 014B |
ő 0151 |
ŕ 0155 |
Bx | ř 0159 |
ś 015B |
š 0161 |
ş 015F |
ť 0165 |
ţ 0163 |
ű 0171 |
ů 016F |
ÿ 00FF |
ź 017A |
ž 017E |
ż 017C |
ij 0133 |
¡ 00A1 |
¿ 00BF |
£ 00A3 |
Cx | À | Á | Â | Ã | Ä | Å | Æ | Ç | È | É | Ê | Ë | Ì | Í | Î | Ï |
Dx | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | Œ 0152 |
Ø | Ù | Ú | Û | Ü | Ý | Þ | SS 1E9E |
Ex | à | á | â | ã | ä | å | æ | ç | è | é | ê | ë | ì | í | î | ï |
Fx | ð | ñ | ò | ó | ô | õ | ö | œ 0153 |
ø | ù | ú | û | ü | ý | þ | ß 00DF |
Notes
- Hexadecimal values under the characters in the table are the Unicode character codes.
- The first 12 characters are often used as combining characters.
- 0x17 is dubbed a “compound word mark” (CWM) in the Cork encoding, and is an innovation of this standard. It is an invisible character that separates compounds in a complex word, for instance in German, in order to disallow esthetic ligatures at compound boundaries. It is mapped to the Unicode “zero-width space” (ZWSP, U+200B), defined at about the same time, whose purpose is similar, if not identical.
- 0x18 is a “small o”, used to compose ‰ or ‱ (or arbitrary smaller quantities) out of percent sign (%).
- ^ Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
- 0x7F is the hyphenation character, not really a soft hyphen (SHY) as defined by Unicode.
- 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
- 0xDF contains SS (two letters S). It allows TeX to automatically convert the German lowercase ß into the uppercase form.
Supported languages
The encoding supports most European languages written in Latin alphabet. Notable exceptions are:
- Esperanto and Maltese language (using IL3)
- Latvian language and Lithuanian language (using L7X)
- Welsh language
Languages with slightly suboptimal support include:
- Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
- Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
- Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages
References
- ^ Petrlik, Lukas (1996-06-19). "The Czech and Slovak Character Encoding Mess Explained". cs-encodings-faq. 1.10. Archived from the original on 2016-06-21. Retrieved 2016-06-21.
- ^ Ferguson, Michael (1990), "Report on Multilingual Activities" (PDF), TUGboat, 11 (4): 514–516
- TeX hyphenation patterns
External links
Character encodings | |
---|---|
Early telecommunications | |
ISO/IEC 8859 |
|
Bibliographic use | |
National standards | |
ISO/IEC 2022 | |
Mac OS Code pages ("scripts") | |
DOS code pages | |
IBM AIX code pages | |
Windows code pages | |
EBCDIC code pages | |
DEC terminals (VTx) | |
Platform specific |
|
Unicode / ISO/IEC 10646 | |
TeX typesetting system | |
Miscellaneous code pages | |
Control character | |
Related topics | |
Character sets |