Misplaced Pages

Unicode collation algorithm

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Unicode. These keys can then be efficiently compared byte by byte in order to collate or sort them according to the rules of the language, with options for ignoring case, accents, etc.

Unicode Technical Report #10 also specifies the Default Unicode Collation Element Table (DUCET). This data file specifies a default collation ordering. The DUCET is customizable for different languages, and some such customizations can be found in the Unicode Common Locale Data Repository (CLDR).

An open source implementation of UCA is included with the International Components for Unicode, ICU. ICU supports tailoring, and the collation tailorings from CLDR are included in ICU.

See also

References

  1. ^ Whistler, Ken; Scherer, Markus; Davis, Mark (2022-08-26). "UTS #10: Unicode Collation Algorithm". Unicode. Retrieved 2023-08-16.
  2. ^ Hosken, Martin (2021-09-23). Unicode Sort Tailoring: Tutorial (PDF) (1.3 ed.). SIL Writing Systems Technology. pp. 2–3. Retrieved 2023-08-16.
  3. "CLDR Releases/Downloads". Unicode CLDR. Retrieved 2023-08-16.
  4. "ICU - International Components for Unicode". Unicode. Retrieved 2023-08-16.
  5. "Collations". SyBooks Online. Retrieved 2023-08-16.
  6. "Customization". ICU Documentation. Retrieved 2023-08-16.

External links

Tools

Unicode
Unicode
Code points
Characters
Special purpose
Lists
Processing
Algorithms
Comparison of encodings
On pairs of
code points
Usage
Related standards
Related topics
Scripts and symbols in Unicode
Common and
inherited scripts
Modern scripts
Ancient and
historic scripts
Notational scripts
Symbols, emojis


Stub icon

This algorithms or data structures-related article is a stub. You can help Misplaced Pages by expanding it.

Stub icon

This standards- or measurement-related article is a stub. You can help Misplaced Pages by expanding it.

Categories: