upload
The Unicode Consortium
産業: Computer; Software
Number of terms: 11048
Number of blossaries: 0
Company Profile:
The Unicode Consortium or Unicode Inc. is a not-for-profit organization that coordinates the development of the Unicode standard. Its stated goal is to eventually enable computers to operate in all languages from around the world. The consortium develops and publishes a list of freely-available ...
A diacritic that is a nonspacing mark.
Industry:Computer; Software
A combining character with the General Category of Nonspacing Mark (Mn) or Enclosing Mark (Me). * The position of a nonspacing mark in presentation depends on its base character. It generally does not consume space along the visual baseline in and of itself. * Such characters may be large enough to affect the placement of their base character relative to preceding and succeeding base characters. For example, a circumflex applied to an “i” may affect spacing (“î”), as might the character U+20DD combining enclosing circle.
Industry:Computer; Software
An expanding canonical decomposition which is not a starter decomposition. * Example: U+0344 combining greek dialytika tonos has an expanding canonical decomposition to the sequence <U+0308 combining diaeresis, U+0301 combining acute accent>. U+0344 is a non-starter, and the first character in its decomposition is a non-starter. Therefore, on two counts, U+0344 has a non-starter decomposition. * Example: U+0F73 tibetan vowel sign ii has an expanding canonical decomposition to the sequence <U+0F71 tibetan vowel sign aa, U+0F72 tibetan vowel sign i>. The first character in that sequence is a non-starter. Therefore U+0F73 has a non-starter decomposition, even though U+0F73 is a Starter. * As of the current version of the standard, there are no instances of the third possible situation: a non-starter character with an expanding canonical decomposition to a sequence whose first character is a Starter.
Industry:Computer; Software
A process of removing alternate representations of equivalent sequences from textual data, to convert the data into a form that can be binary-compared for equivalence. In the Unicode Standard, normalization refers specifically to processing to ensure that canonical-equivalent (and/or compatibility-equivalent) strings have unique representations.
Industry:Computer; Software
A text normalization procedure that replaces equivalent sequences of characters so that any two texts that are equivalent will be reduced to the same sequence of code points, called the normal form of the original text. There are four Unicode normalization forms—namely, NFC, NFD, NFKC, and NFKD.
Industry:Computer; Software
A normalization form that erases any canonical differences, and generally produces a composed result. For example, a + umlaut is converted to ä in this form. This form most closely matches legacy usage.
Industry:Computer; Software
A normalization form that erases any canonical differences, and produces a decomposed result. For example, ä is converted to a + umlaut in this form. This form is most often used in internal processing, such as in collation.
Industry:Computer; Software
A normalization form that erases both canonical and compatibility differences, and generally produces a composed result: for example, the single dž character is converted to d + ž in this form. This form is commonly used in matching.
Industry:Computer; Software
The Compatibility Decomposition of a coded character sequence.
Industry:Computer; Software
Required for conformance with the Unicode Standard.
Industry:Computer; Software