IDNA CJK
From Wikidna.org
Development of IDNA2008 is now in final stage. It will cause incompatibilities for Chinese, Japanese and Korean (CJK) scripts and languages. To avoid incompatibilities with IDNA2008 and current IDNA (IDNA2003), definition of specific local mapping (pre process of IDNA to be performed to IDN candidate string) for CJK is recommended.
Contents |
[edit] 1. Introduction
[edit] 1.1. Positioning of this document
IDNA protocol is going to be revised by IDNA2008 ([I-D.ietf-idnabis-rationale] [I-D.ietf-idnabis-defs] [I-D.ietf-idnabis-protocol] [I-D.ietf-idnabis-tables] [I-D.ietf-idnabis-bidi]) which is in the final stage includes incompatibilities with IDNA2003 ([RFC3490] [RFC3491] [RFC3492]) in some cases. Due to those incompatibilities, name resolution of existing registered IDNs are possible to fail. To avoid such incompatibilities, IDNA2008 recommends to perform local mapping before registration and domain name lookup of IDNA processing, but it does not mention any specific method. This document defines local mapping for IDNA2008 regarding CJK to avoid incompatibilites between IDNA2008 and IDNA2003.
[edit] 1.2. Why CJK?
CJK shares some scripts such as Han and punctations. Therefore, it is useful to have common local mapping definition in areas and / or languages that share scripts. Furthermore, ccTLDs in CJK area are initiative of IDN because they have been actively working for IDN development and deployment since IDNA2003 start up, had published JET Guideline together, and accumulated and shared their experiences and knowledge of IDN registration and operation. Definition developed by such ccTLDs depending on their experiences is useful for community.
[edit] 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
CJK
The term "CJK" stands for "Chinese, Japanese and Korean".
CJK IDN
The term "CJK IDN" stands for "Chinese IDN" or "Japanese IDN" or "Korean IDN".
CJK scripts
+---------------------------------------+---------------+-+-+-+ | Script Name | Code(Range) |C|J|K| +---------------------------------------+---------------+-+-+-+ |CJK Symbols and Punctuation | U+3000-U+3007 |Y|Y| | |Hiragana | U+3040-U+309F | |Y| | |Katakana | U+30A0-U+30FF | |Y| | |CJK Unified Ideographs Extension A | U+3400-U+4DFF |Y| | | |CJK Unified Ideographs | U+4E00-U+9FFF |Y|Y| | |Hangul Syllables | U+AC00-U+D7A3 | | |Y| |CJK Compatibility Ideographs | U+F900-U+FAFF |Y| | | |Halfwidth and Fullwidth Forms | U+FF00-U+FFEF |Y|Y| | |CJK Unified Ideographs Extension A |U+20000-U+2A6D6|Y| | | |CJK Compatibility Ideographs Supplement|U+2F800-U+2FA1F|Y| | | +---------------------------------------+---------------+-+-+-+
Chinese IDN
The term "Chinese IDN" stands for "IDN consists from CJK scripts marked with 'Y' in 'C' column and LDH". Permitted characters in Chinese IDN are listed in [IANA-IDN-Language-zh-CN] and [IANA-IDN-Language-zh-TW].
Japanese IDN
The term "Japanese IDN" stands for "IDN consists from CJK scripts marked with 'Y' in 'J' column and LDH". Permitted characters in Japanese IDN are listed in [IANA-IDN-Language-ja-JP].
Korean IDN
The term "Korean IDN" stands for "IDN consists from CJK scripts marked with 'Y' in 'K' column and LDH". Permitted characters in Korean IDN are listed in [IANA-IDN-Language-ko-KR].
Other terms defined in [I-D.ietf-idnabis-defs] are used.
[edit] 3. List of incompatibilities of CJK between IDNA2008 and IDNA2003
[edit] 3.1. Label separators
Following characters are defined as label separator in IDNA2003, but not defined in IDNA2008.
+---------------------------------------+---------------+-+-+-+ | Script Name | Code |C|J|K| +---------------------------------------+---------------+-+-+-+ |IDEOGRAPHIC FULL STOP | U+3002 |Y|Y| | |HALFWIDTH IDEOGRAPHIC FULL STOP | U+FF61 |Y|Y| | |FULLWIDTH FULL STOP | U+FF0E |Y|Y| | +---------------------------------------+---------------+-+-+-+
CJK IDN which includes those characters are valid in IDNA2003 but invalid in IDNA2008.
[edit] 3.2. Compatibility characters
Compatibility characters that are mapped to canonical (valid) characters by Unicode Normalization Form KC (NFKC) [Unicode] [UAX15] in IDNA2003 are invalid in IDNA2008.
+---------------------------------------+---------------+-+-+-+ | Script Name | Code(Range) |C|J|K| +---------------------------------------+---------------+-+-+-+ |FULLWIDTH DIGITS | U+FF10-U+FF19 |Y|Y| | |FULLWIDTH LATIN CAPITAL LETTERS | U+FF21-U+FF3A |Y|Y| | |FULLWIDTH LATIN SMALL LETTERS | U+FF41-U+FF5A |Y|Y| | |HALFWIDTH KATAKANA LETTERS | U+FF65-U+FF9F | |Y| | +---------------------------------------+---------------+-+-+-+
CJK IDN which includes those characters are valid in IDNA2003 but invalid in IDNA2008.
[edit] 3.3. Exceptions
Some of mark characters of quasi-Han or quasi-Kana are exceptions in IDNA2008 that have position and / or adjacent character property limitations are valid in IDNA2003.
+---------------------------------------+---------------+-+-+-+ | Script Name | Code |C|J|K| +---------------------------------------+---------------+-+-+-+ |IDEOGRAPHIC ITERATION MARK | U+3006 | |Y| | |KATAKANA MIDDLEDOT | U+30FB | |Y| | +---------------------------------------+---------------+-+-+-+
CJK IDN which includes those characters are valid in IDNA2003 but happen to be invalid in IDNA2008.
[edit] 4. Solutions for incompatibilities
[edit] 4.1. Label saparators
If characters listed in section Section 3.1 is included in CJK IDN candidate string, those characters were mapped to FULL STOP (U+002E) in local mapping processing.
[edit] 4.2. Compatibility characters
If characters listed in section Section 3.2 is included in CJK IDN candidate string, those characters were mapped to canonical characters by NFKC in local mapping processing.
[edit] 4.3. Exceptions
If characters listed in section Section 3.3 is included in CJK IDN candidate string, those characters were treated as PROTOCOL VALID instead of CONTEXTO. [NOTE: this is not local mapping, but requires property change of those characters]
[edit] 5. Guideline to keep compatibility in registration protocol
Registries who treat CJK IDN must implement solutions described in section Section 4.
[edit] 6. Guideline to keep compatibility in domain name lookup protocol
Application softwares which treat CJK IDN must implement solutions described in section Section 4. Local mappings or corresponding pre- processing must be performed on user interface (input/output) processing, such as:
- User typing or pasting in input area
- String extraction from free context in contents area
- Displaing on dialogue and / or address bar
Note that those strings are used for domain name lookup.
[edit] 9. Acknowledgements
Many suggestions and advices were given from JET members, especially Yao Jiankang, ...
