Techy Word of the Week: Canonicalization, Normalization

In computer science, canonicalization (sometimes standardization or normalization) is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form.

This can be done to compare different representations for equivalence.

URL

A canonical URL is a URL for defining the single source of truth for duplicate content.

XML

XML canonical form, briefly defined, removes whitespace within tags, uses particular character encodings, sorts namespace references and eliminates redundant ones, removes XML and DOCTYPE declarations, and transforms relative URIs into absolute URIs.

source: https://en.wikipedia.org/wiki/Canonicalization

Phone Number

A canonical phone address is a text string with the following structure:

+ CountryCode Space [(AreaCode) Space] SubscriberNumber

For example, +1 (425) 882-8080

source: https://tapiex.com/TPNet_Help/Canonical%20Addresses.htm

Unicode Normalization Forms

Canonical and Compatibility Equivalence

Canonical equivalence is a fundamental equivalency between characters or sequences of characters which represent the same abstract character, and which when correctly displayed should always have the same visual appearance and behavior.

Normalization Forms

The Unicode Normalization Algorithm puts all combining marks in a specified order, and uses rules for decomposition and composition to transform each string into one of the Unicode Normalization Forms.

The four Unicode Normalization Forms are:

Normalization Form D (NFD) = Canonical Decomposition
Normalization Form C (NFC) = Canonical Decomposition, followed by Canonical Composition
Normalization Form KD (NFKD) = Compatibility Decomposition
Normalization Form KC (NFKC) = Compatibility Decomposition, followed by Canonical Composition

source: https://unicode.org/reports/tr15/

Techy Word of the Week

Saturday, July 5, 2025

Canonicalization, Normalization

URL

XML

Phone Number

Unicode Normalization Forms

Canonical and Compatibility Equivalence

Normalization Forms

No comments:

Post a Comment