In computer science, canonicalization (sometimes standardization or normalization) is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form.
This can be done to compare different representations for equivalence.
URL
A canonical URL is a URL for defining the single source of truth for duplicate content.
XML
XML canonical form, briefly defined, removes whitespace within tags, uses particular character encodings, sorts namespace references and eliminates redundant ones, removes XML and DOCTYPE declarations, and transforms relative URIs into absolute URIs.
source: https://en.wikipedia.org/wiki/Canonicalization
Phone Number
A canonical phone address is a text string with the following structure:
+ CountryCode Space [(AreaCode) Space] SubscriberNumber
For example, +1 (425) 882-8080
source: https://tapiex.com/TPNet_Help/Canonical%20Addresses.htm
Unicode Normalization Forms
Canonical and Compatibility Equivalence
Canonical equivalence is a fundamental equivalency between characters or sequences of characters which represent the same abstract character, and which when correctly displayed should always have the same visual appearance and behavior.
Normalization Forms
The Unicode Normalization Algorithm puts all combining marks in a specified order, and uses rules for decomposition and composition to transform each string into one of the Unicode Normalization Forms.
The four Unicode Normalization Forms are:
- Normalization Form D (NFD) = Canonical Decomposition
- Normalization Form C (NFC) = Canonical Decomposition, followed by Canonical Composition
- Normalization Form KD (NFKD) = Compatibility Decomposition
- Normalization Form KC (NFKC) = Compatibility Decomposition, followed by Canonical Composition
No comments:
Post a Comment