IDNA tables
From Wikidna.org
This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name.
It is part of the specification of IDNA2008.
Contents |
[edit] 1. Introduction
RFC 4690 [RFC4690]suggests an inclusion based approach for selecting the code points from The Unicode Standard [Unicode51] that should be included in the list of code points that may be used in Internationalized Domain Names.
Specifically, RFC 4690 [RFC4690] says the following:
The IAB has concluded that there is a consensus within the broader community that lists of code points should be specified by the use of an inclusion-based mechanism (i.e., identifying the characters that are permitted), rather than by excluding a small number of characters from the total Unicode set as Stringprep [RFC3454] and Nameprep [RFC3491] do today. That conclusion should be reviewed by the IETF community and action taken as appropriate.
This document reviews and classifies the collections of code points in the Unicode character set by examining various properties of the code points. It then defines an algorithm for determining a derived property value. It specifies a procedure, and not a table, of code points so that the algorithm can be used to determine code point sets independent of the version of Unicode that is in use.
This document is not intended to specify precisely how these property values are to be applied in IDN labels. That information appears in [IDNA2008-protocol], but it is important to understand that the assignment of a value of this property to a particular character is not sufficient to determine whether it can be used in a given label.
In particular, some combinations of allowed code points are not advisable for use in IDNs due to rules specific to a script or class of characters. The requirement for such rules is linked to the operations in [IDNA2008-protocol] and especially to the characters designated as requiring contextual rules.
The value of the property is to be interpreted approximately as follows.
- PROTOCOL VALID: Those that are allowed to be used in IDNs. Code points with this property value are permitted for general use in IDNs. However, that a label consists only of code points that have this property value does not imply that the label can be used in DNS. See [IDNA2008-protocol] for algorithms to make decisions about labels in domain names. The abbreviated term PVALID is used to refer to this value in the rest of this document.
- CONTEXTUAL RULE REQUIRED: Some characteristics of the character, such as it being invisible in certain contexts or problematic in others, requires that it not be used in labels unless specific other characters or properties are present. The abbreviated term CONTEXT is used to refer to this value in the rest of this document. There are two subdivisions of CONTEXTUAL RULE REQUIRED, one for Join_controls (called CONTEXTJ) and for other characters (called CONTEXTO). These are discussed in more detail below and in [IDNA2008-protocol].
- DISALLOWED: Those that should clearly not be included in IDNs.
Code points with this property value are not permitted in IDNs.
- UNASSIGNED: Those code points that are not designated (i.e. are unassigned) in the Unicode Standard.
The mechanisms described here allow determination of the value of the property for future versions of Unicode (including characters added after Unicode 5.1). Changes in Unicode properties that do not affect the outcome of this process do not affect IDN. For example, a character can have its Unicode General_Category value change from So to Sm, or from Lo to Ll, without affecting the algorithm results.
Moreover, even if such changes were to result, the BackwardCompatible list (Section 2.7) can be adjusted to ensure the stability of the results.
Some code points need to be allowed in exceptional circumstances, but should be excluded in all other cases; these rules are also described in other documents. The most notable of these are the the Join Control characters, U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON-JOINER. Both of them have the derived property value CONTEXTJ.
A character with the derived property value CONTEXTJ or CONTEXTO (CONTEXTUAL RULE REQUIRED) is not to be used unless an appropriate rule has been established and the context of the character is consistent with that rule. It is invalid to either register a string containing these characters or even to look one up unless such contextual rule is found and satisfied. Please see Appendix A, The Contextual Rules Registry, for more information.
This document is part of a series that, together, constitute a proposal for updating the IDNA standards to resolve issues uncovered in recent years, cover a broader range of scripts, and provide for migration to newer versions of Unicode. See [IDNA2008-rationale] for a broader discussion.
[edit] 2. Category definitions Used to Calculate Derived Property Value
The derived property obtains its value based on a two-step procedure.
First, characters are placed in one or more character categories based on either core properties defined by the Unicode Standard or by treating the codepoint as an exception and addressing the codepoint by its codepoint value. These categories are not mutually exclusive.
In the second step, set operations are used with these categories to
determine the values for an IDN-specific property. Those operations
are specified in Section 3.
Unicode property names and property value names may have short abbreviations, such as gc for the General_Category property, and Ll for the Lowercase_Letter property value of the gc property.
In the following specification of categories, the operation which returns the value of a particular Unicode character property for a code point is designated by using the formal name of that property (from PropertyAliases.txt) followed by '(cp)'. For example, the value of the General_Category property for a code point is indicated by General_Category(cp).
[edit] 2.1. LetterDigits (A)
A: General_Category(cp) is in {Ll, Lu, Lo, Nd, Lm, Mn, Mc} These rules identifies characters commonly used in mnemonics and often informally described as "language characters". In general, only code points assigned to this category are suitable for use in IDN.
For more information, see section 4.5 of The Unicode Standard [Unicode5].
The categories used in this rule are:
- Ll - Lowercase_Letter * Lu - Uppercase_Letter * Lo - Other_Letter * Nd - Decimal_Number * Lm - Modifier_Letter * Mn - Nonspacing_Mark * Mc - Spacing_Mark === 2.2. Unstable (B) ===
B: toNFKC(toCaseFold(toNFKC(cp))) != cp This category is used to group the characters that are not stable under NFKC normalization and casefolding. In general, these code points are not suitable for use for IDN.
The toCaseFold() operation is defined in Section 3.13 of The Unicode Standard [Unicode5].
The toNFKC() operation returns the code point in normalization form KC. For more information, see Section 5 of Unicode Standard Annex
- 15 [TR15].
It should be noted that NFKC is used, although NFC is used in the "IDNA Protocol" document [IDNA2008-protocol].
[edit] 2.3. IgnorableProperties (C)
C: Default_Ignorable_Code_Point(cp) = True or White_Space(cp) = True or Noncharacter_Code_Point(cp) = True This category is used to group code points that are not recommended for use in identifiers. In general, these code points are not suitable for use for IDN.
The definition for Default_Ignorable_Code_Point can be found in DerivedCoreProperties.txt [1] and is at the time of Unicode 5.1:
Other_Default_Ignorable_Code_Point + Cf (Format characters) + Variation_Selector - White_Space - FFF9..FFFB (Annotation Characters) - 0600..0603, 06DD, 070F (exceptional Cf characters that should be visible)
[edit] 2.4. IgnorableBlocks (D)
D: Block(cp) is in {Combining Diacritical Marks for Symbols, Musical Symbols, Ancient Greek Musical Notation} This category is used to identifying code points that are not useful in mnemonics or that are otherwise impractical for IDN use. In general, these code points are not suitable for use for IDN.
The definition of blocks can be found in Blocks.txt [2]
[edit] 2.5. LDH (E)
E: cp is in {002D, 0030..0039, 0061..007A} This category is used in the second step to preserve the traditional "hostname" (LDH) characters ('-', 0-9 and a-z). In general, these code points are suitable for use for IDN. Note that there are other rules regarding the code point U+002D HYPHEN-MINUS that are specified in the IDNA Protocol Specification [IDNA2008-protocol].
[edit] 2.6. Exceptions (F)
F: cp is in {00B7, 00DF, 0375, 03C2, 05F3, 05F4, 0640, 0660,
0661, 0662, 0663, 0664, 0665, 0666, 0667, 0668,
0669, 06F0, 06F1, 06F2, 06F3, 06F4, 06F5, 06F6,
06F7, 06F8, 06F9, 06FD, 06FE, 07FA, 0F0B, 3007,
302E, 302F, 3031, 3032, 3033, 3034, 3035, 303B,
30FB}
This category explicitly lists code points for which the category cannot be assigned using only the core property values that exist in the Unicode standard. The values are according to the table below:
== 00DF; PVALID # LATIN SMALL LETTER SHARP S == == 03C2; PVALID # GREEK SMALL LETTER FINAL SIGMA == == 06FD; PVALID # ARABIC SIGN SINDHI AMPERSAND == == 06FE; PVALID # ARABIC SIGN SINDHI POSTPOSITION MEN == == 0F0B; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG == == 3007; PVALID # IDEOGRAPHIC NUMBER ZERO == == 00B7; CONTEXTO # MIDDLE DOT == == 0375; CONTEXTO # GREEK LOWER NUMERAL SIGN (KERAIA) == == 05F3; CONTEXTO # HEBREW PUNCTUATION GERESH == == 05F4; CONTEXTO # HEBREW PUNCTUATION GERSHAYIM == == 30FB; CONTEXTO # KATAKANA MIDDLE DOT == == 0660; CONTEXTO # ARABIC-INDIC DIGIT ZERO == == 0661; CONTEXTO # ARABIC-INDIC DIGIT ONE == == 0662; CONTEXTO # ARABIC-INDIC DIGIT TWO == == 0663; CONTEXTO # ARABIC-INDIC DIGIT THREE == == 0664; CONTEXTO # ARABIC-INDIC DIGIT FOUR == == 0665; CONTEXTO # ARABIC-INDIC DIGIT FIVE == == 0666; CONTEXTO # ARABIC-INDIC DIGIT SIX == == 0667; CONTEXTO # ARABIC-INDIC DIGIT SEVEN == == 0668; CONTEXTO # ARABIC-INDIC DIGIT EIGHT == == 0669; CONTEXTO # ARABIC-INDIC DIGIT NINE == == 06F0; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ZERO == == 06F1; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ONE == == 06F2; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT TWO == == 06F3; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT THREE == == 06F4; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FOUR == == 06F5; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT FIVE == == 06F6; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SIX == == 06F7; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT SEVEN == == 06F8; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT EIGHT == == 06F9; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT NINE == == 0640; DISALLOWED # ARABIC TATWEEL == == 07FA; DISALLOWED # NKO LAJANYALAN == == 302E; DISALLOWED # HANGUL SINGLE DOT TONE MARK == == 302F; DISALLOWED # HANGUL DOUBLE DOT TONE MARK == == 3031; DISALLOWED # VERTICAL KANA REPEAT MARK == == 3032; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK == == 3033; DISALLOWED # VERTICAL KANA REPEAT MARK UPPER HALF == == 3034; DISALLOWED # VERTICAL KANA REPEAT WITH VOICED SOUND MARK UPPER HALF == == 3035; DISALLOWED # VERTICAL KANA REPEAT MARK LOWER HALF == == 303B; DISALLOWED # VERTICAL IDEOGRAPHIC ITERATION MARK ==
[edit] 2.7. BackwardCompatible (G)
G: cp is in {}
This category includes the code points that property values in versions of Unicode after 5.1 have changed in such a way that the derived property value would no longer be PVALID or DISALLOWED. If changes are made to future versions of Unicode so that code points might change property value from PVALID or DISALLOWED, then this table can be updated and keep special exception values so that the property values for code points stay stable.
[edit] 2.8. JoinControl (H)
H: Join_Control(cp) = True
This category consists of Join Control characters (i.e., they are not in LetterDigits (Section 2.1)) but are still required in IDN labels under some circumstances.
[edit] 2.9. OldHangulJamo (I)
I: Hangul_Syllable_Type(cp) is in {L, V, T}
This category consists of all conjoining Hangul Jamo (Leading Jamo, Vowel Jamo, and Trailing Jamo).
Elimination of conjoining Hangul Jamos from the set of PVALID characters results in restricting the set of Korean PVALID characters just to preformed, modern Hangul syllable characters. Old Hangul syllables, which must be spelled with sequences of conjoining Hangul Jamos, are not PVALID for IDNs.
[edit] 2.10. Unassigned (J)
This category consists of code points in the Unicode character set that are not (yet) assigned. It should be noted that Unicode distinguishes between 'unassigned code points' and 'unassigned characters'. The unassigned code points are all but (Cn - Noncharacters), while the unassigned *characters* are all but (Cn + Cs).
[edit] 3. Calculation of the Derived Property
As described above (Section 1) and in more detail in the "IDNA Protocol" document [IDNA2008-protocol], possible values of the IDN property are:
- PVALID
- CONTEXTJ
- CONTEXTO
- DISALLOWED
- UNASSIGNED
The algorithm to calculate the value of the derived property is as follows. If the names of a rule (such as Exception) is used, that implies the set of codepoints that the rule define, while the same name as a function call (such as Exception(cp)) imply the value cp has in the Exceptions table.
If .cp. .in. Exceptions Then Exceptions(cp); Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp); Else If .cp. .in. Unassigned Then UNASSIGNED; Else If .cp. .in. LDH Then PVALID; Else If .cp. .in. JoinControl Then CONTEXTJ; Else If .cp. .in. Unstable Then DISALLOWED; Else If .cp. .in. IgnorableProperties Then DISALLOWED; Else If .cp. .in. IgnorableBlocks Then DISALLOWED; Else If .cp. .in. OldHangulJamo Then DISALLOWED; Else If .cp. .in. LetterDigits Then PVALID; Else DISALLOWED;
[edit] 4. Codepoints
The Categories and Rules defined in Section 2 and Section 3 apply to all Unicode code points. The table in Appendix B shows, for illustrative purposes, the consequences of the categories and classification rules, and the resulting property values.
The list of code points that can be found in Appendix B is non- normative. Section 2 and Section 3 are normative.
[edit] 5. IANA Considerations
[edit] 5.1. IDNA derived property value registry
IANA is to keep a list of the derived property for the versions of Unicode that is released after (and including) version 5.1. The derived property value is to be calculated according to the specifications in sections Section 2 and Section 3 and not by copying the non-normative table found in Appendix B. Changes to the rules, including BackwardCompatible (Section 2.7) (a set that is at release of this document is empty), require IETF Review, as described in [RFC5226]
[edit] 5.2. IDNA Context Registry
For characters that are defined in the IDNA Character Registry list as CONTEXTO or CONTEXTJ and therefore requiring a contextual rule IANA will create and maintain a list of approved contextual rules.
Additions or changes to these rules require IETF Review, as described in [RFC5226].
A table from which that registry can be initialized, and some further discussion, appears in Appendix A.
[edit] 6. Security Considerations
The security issues associated with this work are discussed in [IDNA2008-protocol].
[edit] 7. Acknowledgements
This document would not have been possible to produce without input from many people. The main contributors are (in alphabetical order) Harald Alvestrand, Vint Cerf, Tina Dam, Mark Davis, Gihan Dias, Mouhammet Diop, Michael Everson, Asmus Freytag, Debbie Garside, Paul Hoffman, Kent Karlsson, Cary Karp, Jaeyoun Kim, John Klensin, Olaf Kolkman, Gervase Markham, Ram Mohan, Lisa Moore, Yngve Pettersen, Erik van der Poel, Hualin Qian, Rick Reed, Pete Resnick, Lakmal Silva, Michel Suignard, Andrew Sullivan, Wil Tan, Kenneth Whistler, Chris Wright and Yoshiro Yoneya.
As discussed in Section 5.2 and in the IANA Considerations section of
[IDNA2008-rationale], a registry of rules that define the contexts in
which particular PROTOCOL-VALID characters, characters associated
with a requirement for Contextual Information, are permitted. These
rules are expressed as tests on the label in which the characters
appear (all, or any part of, the label may be tested).
The grammatical rules are expressed in pseudo code. The conventions used for that pseudo code are explained here.
Each rule is constructed as a Boolean expression that evaluates to either True or False. A simple "True;" or "False;" rule sets the default result value for the rule set. Subsequent conditional rules that evaluate to True or False may re-set the result value.
A special value "Undefined" is used to deal with any error conditions, such as an attempt to test a character before the start of a label or after the end of a label. If any term of a rule evaluates to Undefined, further evaluation of the rule immediately terminates, as the result value of the rule will itself be Undefined.
cp represents the codepoint to be tested.
FirstChar is a special term which denotes the first codepoint in a label.
LastChar is a special term which denotes the last codepoint in a label.
.eq. represents the equality relation.
A .eq. B evaluates to True if A equals B.
.is. represents checking position in a label.
A .is. B evaluates to True if A and B have same position in the same label.
.ne. represents the non-equality relation.
A .ne. B evaluates to True if A is not equal to B.
.in. represents the set inclusion relation.
A .in. B evaluates to True if A is a member of the set B.
A functional notation, Function_Name(cp), is used to express either string positions within a label, Boolean character property tests of a codepoint, or a regular expression match. When such function names refer to Boolean character property tests, the function names use the exact Unicode character property name for the property in question, and "cp" is evaluated as the Unicode value of the codepoint to be tested, rather than as its position in the label. When such function names refer to string positions within a label, "cp" is evaluated as its position in the label.
RegExpMatch(X) takes as its parameter X a schematic regular expression consisting of a mix of Unicode character property values and literal Unicode codepoints.
Script(cp) returns the value of the Unicode Script property, as defined in Scripts.txt in the Unicode Character Database.
Canonical_Combining_Class(cp) returns the value of the Unicode Canonical_Combining_Class property, as defined in UnicodeData.txt in the Unicode Character Database.
Before(cp) returns the codepoint of the character immediately preceding cp in logical order in the string representing the label.
Before(FirstChar) evaluates to Undefined.
After(cp) returns the codepoint of the character immediately following cp in logical order in the string representing the label.
After(LastChar) evaluates to Undefined.
Note that "Before" and "After" do not refer to the visual display order of the character in a label, which may be reversed or otherwise modified by the bidirectional algorithm for labels including characters from scripts written right-to-left. Instead, 'Before' and 'After' refer to the network order of the character in the label.
The clauses "Then True" and "Then False" imply exit from the pseudo- code routine with the corresponding result.
Repeated evaluation for all characters in a label makes use of the special construct:
For All Characters:
Expression; End For;
This construct requires repeated evaluation of "Expression" for each codepoint in the label, starting from FirstChar and proceeding to LastChar.
The different fields in the rules are to be interpreted as follows:
Code point:
The codepoint, or codepoints, that this rule is to be applied to.
Normally, this implies that if any of the codepoints in a label is as defined, then the rules should be applied. If evaluated to True, the codepoint is ok as used; if evaluated to False, it is not o.k.
Overview:
A description of the goal with the rule, in plain English.
Lookup:
True if application of this rule is recommended at lookup time; False otherwise.
Rule Set:
The rule set itself, as described above.
Code point:
U+200C
Overview:
This may occur in a formally cursive script (such as Arabic) in a context where it breaks a cursive connection as required for orthographic rules, as in the Persian language, for example. It also may occur in Indic scripts in a consonant conjunct context (immediately following a virama), to control required display of such conjuncts.
Lookup:
True Rule Set:
False;
If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True;
If RegExpMatch((Joining_Type:{L,D})(Joining_Type:T)*\u200C
(Joining_Type:T)*(Joining_Type:{R,D})) Then True;
Code point:
U+200D
Overview:
This may occur in Indic scripts in a consonant conjunct context (immediately following a virama), to control required display of such conjuncts.
Lookup:
True Rule Set: False; If Canonical_Combining_Class(Before(cp)) .eq. Virama Then True; Code point: U+00B7
Overview:
Between 'l' (U+006C) characters only, used to permit the Catalan character ela geminada to be expressed
Lookup:
False Rule Set: False; If Before(cp) .eq. U+006C And After(cp) .eq. U+006C Then True; Code point: U+0375
Overview:
The script of the following character MUST be Greek.
Lookup: False Rule Set: False; If Script(After(cp)) .eq. Greek Then True; Code point: U+05F3
Overview:
The script of the preceding character MUST be Hebrew.
Lookup: False Rule Set: False; If Script(Before(cp)) .eq. Hebrew Then True; Code point: U+05F4
Overview:
The script of the preceding character MUST be Hebrew.
Lookup: False Rule Set: False; If Script(Before(cp)) .eq. Hebrew Then True; Code point: U+30FB
Overview:
Note that the Script of Katakana Middle Dot is not any of "Hiragana", "Katakana" or "Han". The effect of this rule is to require at least one character in the label to be in one of those scripts.
Lookup:
False
Rule Set:
False;
For All Characters:
If Script(cp) .in. {Hiragana, Katakana, Han} Then True;
End For;
Code point:
0660..0669
Overview:
Can not be mixed with Extended Arabic-Indic Digits.
Lookup: False Rule Set: True; For All Characters: If cp .in. 06F0..06F9 Then False; End For; Code point: 06F0..06F9
Overview:
Can not be mixed with Arabic-Indic Digits.
Lookup: False Rule Set: True; For All Characters: If cp .in. 0660..0669 Then False; End For;
If one applies the rules (Section 3) to the code points 0x0000 to 0x10FFFF to Unicode 5.1, the result is as follows.
This list is non-normative, and only included for illustrative purposes. Specifically, what is displayed in the third column is not the formal name of the codepoint (as defined in section 4.8 of The Unicode Standard [Unicode51]). The differences exists for example for the codepoints that have the codepoint value as part of the name (example: CJK UNIFIED IDEOGRAPH-4E00) and the naming of Hangul syllables. For many codepoints, what you see is the official name.
=== 0000..002C ; DISALLOWED # <control>..COMMA === == 002D ; PVALID # HYPHEN-MINUS == === 002E..002F ; DISALLOWED # FULL STOP..SOLIDUS === === 0030..0039 ; PVALID # DIGIT ZERO..DIGIT NINE === === 003A..0060 ; DISALLOWED # COLON..GRAVE ACCENT === === 0061..007A ; PVALID # LATIN SMALL LETTER A..LATIN SMALL LETTER Z === === 007B..00B6 ; DISALLOWED # LEFT CURLY BRACKET..PILCROW SIGN === == 00B7 ; CONTEXTO # MIDDLE DOT == === 00B8..00DE ; DISALLOWED # CEDILLA..LATIN CAPITAL LETTER THORN === === 00DF..00F6 ; PVALID # LATIN SMALL LETTER SHARP S..LATIN SMALL LETT === == 00F7 ; DISALLOWED # DIVISION SIGN == === 00F8..00FF ; PVALID # LATIN SMALL LETTER O WITH STROKE..LATIN SMAL === == 0100 ; DISALLOWED # LATIN CAPITAL LETTER A WITH MACRON == == 0101 ; PVALID # LATIN SMALL LETTER A WITH MACRON == == 0102 ; DISALLOWED # LATIN CAPITAL LETTER A WITH BREVE == == 0103 ; PVALID # LATIN SMALL LETTER A WITH BREVE == == 0104 ; DISALLOWED # LATIN CAPITAL LETTER A WITH OGONEK == == 0105 ; PVALID # LATIN SMALL LETTER A WITH OGONEK == == 0106 ; DISALLOWED # LATIN CAPITAL LETTER C WITH ACUTE == == 0107 ; PVALID # LATIN SMALL LETTER C WITH ACUTE == == 0108 ; DISALLOWED # LATIN CAPITAL LETTER C WITH CIRCUMFLEX == == 0109 ; PVALID # LATIN SMALL LETTER C WITH CIRCUMFLEX == == 010A ; DISALLOWED # LATIN CAPITAL LETTER C WITH DOT ABOVE == == 010B ; PVALID # LATIN SMALL LETTER C WITH DOT ABOVE == == 010C ; DISALLOWED # LATIN CAPITAL LETTER C WITH CARON == == 010D ; PVALID # LATIN SMALL LETTER C WITH CARON == == 010E ; DISALLOWED # LATIN CAPITAL LETTER D WITH CARON == == 010F ; PVALID # LATIN SMALL LETTER D WITH CARON == == 0110 ; DISALLOWED # LATIN CAPITAL LETTER D WITH STROKE == == 0111 ; PVALID # LATIN SMALL LETTER D WITH STROKE == == 0112 ; DISALLOWED # LATIN CAPITAL LETTER E WITH MACRON == == 0113 ; PVALID # LATIN SMALL LETTER E WITH MACRON == == 0114 ; DISALLOWED # LATIN CAPITAL LETTER E WITH BREVE == == 0115 ; PVALID # LATIN SMALL LETTER E WITH BREVE == == 0116 ; DISALLOWED # LATIN CAPITAL LETTER E WITH DOT ABOVE == == 0117 ; PVALID # LATIN SMALL LETTER E WITH DOT ABOVE == == 0118 ; DISALLOWED # LATIN CAPITAL LETTER E WITH OGONEK == == 0119 ; PVALID # LATIN SMALL LETTER E WITH OGONEK == == 011A ; DISALLOWED # LATIN CAPITAL LETTER E WITH CARON == == 011B ; PVALID # LATIN SMALL LETTER E WITH CARON == == 011C ; DISALLOWED # LATIN CAPITAL LETTER G WITH CIRCUMFLEX == == 011D ; PVALID # LATIN SMALL LETTER G WITH CIRCUMFLEX == == 011E ; DISALLOWED # LATIN CAPITAL LETTER G WITH BREVE == == 011F ; PVALID # LATIN SMALL LETTER G WITH BREVE == == 0120 ; DISALLOWED # LATIN CAPITAL LETTER G WITH DOT ABOVE == == 0121 ; PVALID # LATIN SMALL LETTER G WITH DOT ABOVE == == 0122 ; DISALLOWED # LATIN CAPITAL LETTER G WITH CEDILLA == == 0123 ; PVALID # LATIN SMALL LETTER G WITH CEDILLA == == 0124 ; DISALLOWED # LATIN CAPITAL LETTER H WITH CIRCUMFLEX == == 0125 ; PVALID # LATIN SMALL LETTER H WITH CIRCUMFLEX == == 0126 ; DISALLOWED # LATIN CAPITAL LETTER H WITH STROKE == == 0127 ; PVALID # LATIN SMALL LETTER H WITH STROKE == == 0128 ; DISALLOWED # LATIN CAPITAL LETTER I WITH TILDE == == 0129 ; PVALID # LATIN SMALL LETTER I WITH TILDE == == 012A ; DISALLOWED # LATIN CAPITAL LETTER I WITH MACRON == == 012B ; PVALID # LATIN SMALL LETTER I WITH MACRON == == 012C ; DISALLOWED # LATIN CAPITAL LETTER I WITH BREVE == == 012D ; PVALID # LATIN SMALL LETTER I WITH BREVE == == 012E ; DISALLOWED # LATIN CAPITAL LETTER I WITH OGONEK == == 012F ; PVALID # LATIN SMALL LETTER I WITH OGONEK == == 0130 ; DISALLOWED # LATIN CAPITAL LETTER I WITH DOT ABOVE == == 0131 ; PVALID # LATIN SMALL LETTER DOTLESS I == === 0132..0134 ; DISALLOWED # LATIN CAPITAL LIGATURE IJ..LATIN CAPITAL LET === == 0135 ; PVALID # LATIN SMALL LETTER J WITH CIRCUMFLEX == == 0136 ; DISALLOWED # LATIN CAPITAL LETTER K WITH CEDILLA == === 0137..0138 ; PVALID # LATIN SMALL LETTER K WITH CEDILLA..LATIN SMA === == 0139 ; DISALLOWED # LATIN CAPITAL LETTER L WITH ACUTE == == 013A ; PVALID # LATIN SMALL LETTER L WITH ACUTE == == 013B ; DISALLOWED # LATIN CAPITAL LETTER L WITH CEDILLA == == 013C ; PVALID # LATIN SMALL LETTER L WITH CEDILLA == == 013D ; DISALLOWED # LATIN CAPITAL LETTER L WITH CARON == == 013E ; PVALID # LATIN SMALL LETTER L WITH CARON == === 013F..0141 ; DISALLOWED # LATIN CAPITAL LETTER L WITH MIDDLE DOT..LATI === == 0142 ; PVALID # LATIN SMALL LETTER L WITH STROKE == == 0143 ; DISALLOWED # LATIN CAPITAL LETTER N WITH ACUTE == == 0144 ; PVALID # LATIN SMALL LETTER N WITH ACUTE == == 0145 ; DISALLOWED # LATIN CAPITAL LETTER N WITH CEDILLA == == 0146 ; PVALID # LATIN SMALL LETTER N WITH CEDILLA == == 0147 ; DISALLOWED # LATIN CAPITAL LETTER N WITH CARON == == 0148 ; PVALID # LATIN SMALL LETTER N WITH CARON == === 0149..014A ; DISALLOWED # LATIN SMALL LETTER N PRECEDED BY APOSTROPHE. === == 014B ; PVALID # LATIN SMALL LETTER ENG == == 014C ; DISALLOWED # LATIN CAPITAL LETTER O WITH MACRON == == 014D ; PVALID # LATIN SMALL LETTER O WITH MACRON == == 014E ; DISALLOWED # LATIN CAPITAL LETTER O WITH BREVE == == 014F ; PVALID # LATIN SMALL LETTER O WITH BREVE == == 0150 ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOUBLE ACUTE == == 0151 ; PVALID # LATIN SMALL LETTER O WITH DOUBLE ACUTE == == 0152 ; DISALLOWED # LATIN CAPITAL LIGATURE OE == == 0153 ; PVALID # LATIN SMALL LIGATURE OE == == 0154 ; DISALLOWED # LATIN CAPITAL LETTER R WITH ACUTE == == 0155 ; PVALID # LATIN SMALL LETTER R WITH ACUTE == == 0156 ; DISALLOWED # LATIN CAPITAL LETTER R WITH CEDILLA == == 0157 ; PVALID # LATIN SMALL LETTER R WITH CEDILLA == == 0158 ; DISALLOWED # LATIN CAPITAL LETTER R WITH CARON == == 0159 ; PVALID # LATIN SMALL LETTER R WITH CARON == == 015A ; DISALLOWED # LATIN CAPITAL LETTER S WITH ACUTE == == 015B ; PVALID # LATIN SMALL LETTER S WITH ACUTE == == 015C ; DISALLOWED # LATIN CAPITAL LETTER S WITH CIRCUMFLEX == == 015D ; PVALID # LATIN SMALL LETTER S WITH CIRCUMFLEX == == 015E ; DISALLOWED # LATIN CAPITAL LETTER S WITH CEDILLA == == 015F ; PVALID # LATIN SMALL LETTER S WITH CEDILLA == == 0160 ; DISALLOWED # LATIN CAPITAL LETTER S WITH CARON == == 0161 ; PVALID # LATIN SMALL LETTER S WITH CARON == == 0162 ; DISALLOWED # LATIN CAPITAL LETTER T WITH CEDILLA == == 0163 ; PVALID # LATIN SMALL LETTER T WITH CEDILLA == == 0164 ; DISALLOWED # LATIN CAPITAL LETTER T WITH CARON == == 0165 ; PVALID # LATIN SMALL LETTER T WITH CARON == == 0166 ; DISALLOWED # LATIN CAPITAL LETTER T WITH STROKE == == 0167 ; PVALID # LATIN SMALL LETTER T WITH STROKE == == 0168 ; DISALLOWED # LATIN CAPITAL LETTER U WITH TILDE == == 0169 ; PVALID # LATIN SMALL LETTER U WITH TILDE == == 016A ; DISALLOWED # LATIN CAPITAL LETTER U WITH MACRON == == 016B ; PVALID # LATIN SMALL LETTER U WITH MACRON == == 016C ; DISALLOWED # LATIN CAPITAL LETTER U WITH BREVE == == 016D ; PVALID # LATIN SMALL LETTER U WITH BREVE == == 016E ; DISALLOWED # LATIN CAPITAL LETTER U WITH RING ABOVE == == 016F ; PVALID # LATIN SMALL LETTER U WITH RING ABOVE == == 0170 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DOUBLE ACUTE == == 0171 ; PVALID # LATIN SMALL LETTER U WITH DOUBLE ACUTE == == 0172 ; DISALLOWED # LATIN CAPITAL LETTER U WITH OGONEK == == 0173 ; PVALID # LATIN SMALL LETTER U WITH OGONEK == == 0174 ; DISALLOWED # LATIN CAPITAL LETTER W WITH CIRCUMFLEX == == 0175 ; PVALID # LATIN SMALL LETTER W WITH CIRCUMFLEX == == 0176 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH CIRCUMFLEX == == 0177 ; PVALID # LATIN SMALL LETTER Y WITH CIRCUMFLEX == === 0178..0179 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH DIAERESIS..LATIN === == 017A ; PVALID # LATIN SMALL LETTER Z WITH ACUTE == == 017B ; DISALLOWED # LATIN CAPITAL LETTER Z WITH DOT ABOVE == == 017C ; PVALID # LATIN SMALL LETTER Z WITH DOT ABOVE == == 017D ; DISALLOWED # LATIN CAPITAL LETTER Z WITH CARON == == 017E ; PVALID # LATIN SMALL LETTER Z WITH CARON == == 017F ; DISALLOWED # LATIN SMALL LETTER LONG S == == 0180 ; PVALID # LATIN SMALL LETTER B WITH STROKE == === 0181..0182 ; DISALLOWED # LATIN CAPITAL LETTER B WITH HOOK..LATIN CAPI === == 0183 ; PVALID # LATIN SMALL LETTER B WITH TOPBAR == == 0184 ; DISALLOWED # LATIN CAPITAL LETTER TONE SIX == == 0185 ; PVALID # LATIN SMALL LETTER TONE SIX == === 0186..0187 ; DISALLOWED # LATIN CAPITAL LETTER OPEN O..LATIN CAPITAL L === == 0188 ; PVALID # LATIN SMALL LETTER C WITH HOOK == === 0189..018B ; DISALLOWED # LATIN CAPITAL LETTER AFRICAN D..LATIN CAPITA === === 018C..018D ; PVALID # LATIN SMALL LETTER D WITH TOPBAR..LATIN SMAL === === 018E..0191 ; DISALLOWED # LATIN CAPITAL LETTER REVERSED E..LATIN CAPIT === == 0192 ; PVALID # LATIN SMALL LETTER F WITH HOOK == === 0193..0194 ; DISALLOWED # LATIN CAPITAL LETTER G WITH HOOK..LATIN CAPI === == 0195 ; PVALID # LATIN SMALL LETTER HV == === 0196..0198 ; DISALLOWED # LATIN CAPITAL LETTER IOTA..LATIN CAPITAL LET === === 0199..019B ; PVALID # LATIN SMALL LETTER K WITH HOOK..LATIN SMALL === === 019C..019D ; DISALLOWED # LATIN CAPITAL LETTER TURNED M..LATIN CAPITAL === == 019E ; PVALID # LATIN SMALL LETTER N WITH LONG RIGHT LEG == === 019F..01A0 ; DISALLOWED # LATIN CAPITAL LETTER O WITH MIDDLE TILDE..LA === == 01A1 ; PVALID # LATIN SMALL LETTER O WITH HORN == == 01A2 ; DISALLOWED # LATIN CAPITAL LETTER OI == == 01A3 ; PVALID # LATIN SMALL LETTER OI == == 01A4 ; DISALLOWED # LATIN CAPITAL LETTER P WITH HOOK == == 01A5 ; PVALID # LATIN SMALL LETTER P WITH HOOK == === 01A6..01A7 ; DISALLOWED # LATIN LETTER YR..LATIN CAPITAL LETTER TONE T === == 01A8 ; PVALID # LATIN SMALL LETTER TONE TWO == == 01A9 ; DISALLOWED # LATIN CAPITAL LETTER ESH == === 01AA..01AB ; PVALID # LATIN LETTER REVERSED ESH LOOP..LATIN SMALL === == 01AC ; DISALLOWED # LATIN CAPITAL LETTER T WITH HOOK == == 01AD ; PVALID # LATIN SMALL LETTER T WITH HOOK == === 01AE..01AF ; DISALLOWED # LATIN CAPITAL LETTER T WITH RETROFLEX HOOK.. === == 01B0 ; PVALID # LATIN SMALL LETTER U WITH HORN == === 01B1..01B3 ; DISALLOWED # LATIN CAPITAL LETTER UPSILON..LATIN CAPITAL === == 01B4 ; PVALID # LATIN SMALL LETTER Y WITH HOOK == == 01B5 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH STROKE == == 01B6 ; PVALID # LATIN SMALL LETTER Z WITH STROKE == === 01B7..01B8 ; DISALLOWED # LATIN CAPITAL LETTER EZH..LATIN CAPITAL LETT === === 01B9..01BB ; PVALID # LATIN SMALL LETTER EZH REVERSED..LATIN LETTE === == 01BC ; DISALLOWED # LATIN CAPITAL LETTER TONE FIVE == === 01BD..01C3 ; PVALID # LATIN SMALL LETTER TONE FIVE..LATIN LETTER R === === 01C4..01CD ; DISALLOWED # LATIN CAPITAL LETTER DZ WITH CARON..LATIN CA === == 01CE ; PVALID # LATIN SMALL LETTER A WITH CARON == == 01CF ; DISALLOWED # LATIN CAPITAL LETTER I WITH CARON == == 01D0 ; PVALID # LATIN SMALL LETTER I WITH CARON == == 01D1 ; DISALLOWED # LATIN CAPITAL LETTER O WITH CARON == == 01D2 ; PVALID # LATIN SMALL LETTER O WITH CARON == == 01D3 ; DISALLOWED # LATIN CAPITAL LETTER U WITH CARON == == 01D4 ; PVALID # LATIN SMALL LETTER U WITH CARON == == 01D5 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND MA == == 01D6 ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND MACR == == 01D7 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND AC == == 01D8 ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND ACUT == == 01D9 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND CA == == 01DA ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND CARO == == 01DB ; DISALLOWED # LATIN CAPITAL LETTER U WITH DIAERESIS AND GR == === 01DC..01DD ; PVALID # LATIN SMALL LETTER U WITH DIAERESIS AND GRAV === == 01DE ; DISALLOWED # LATIN CAPITAL LETTER A WITH DIAERESIS AND MA == == 01DF ; PVALID # LATIN SMALL LETTER A WITH DIAERESIS AND MACR == == 01E0 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOT ABOVE AND MA == == 01E1 ; PVALID # LATIN SMALL LETTER A WITH DOT ABOVE AND MACR == == 01E2 ; DISALLOWED # LATIN CAPITAL LETTER AE WITH MACRON == == 01E3 ; PVALID # LATIN SMALL LETTER AE WITH MACRON == == 01E4 ; DISALLOWED # LATIN CAPITAL LETTER G WITH STROKE == == 01E5 ; PVALID # LATIN SMALL LETTER G WITH STROKE == == 01E6 ; DISALLOWED # LATIN CAPITAL LETTER G WITH CARON == == 01E7 ; PVALID # LATIN SMALL LETTER G WITH CARON == == 01E8 ; DISALLOWED # LATIN CAPITAL LETTER K WITH CARON == == 01E9 ; PVALID # LATIN SMALL LETTER K WITH CARON == == 01EA ; DISALLOWED # LATIN CAPITAL LETTER O WITH OGONEK == == 01EB ; PVALID # LATIN SMALL LETTER O WITH OGONEK == == 01EC ; DISALLOWED # LATIN CAPITAL LETTER O WITH OGONEK AND MACRO == == 01ED ; PVALID # LATIN SMALL LETTER O WITH OGONEK AND MACRON == == 01EE ; DISALLOWED # LATIN CAPITAL LETTER EZH WITH CARON == === 01EF..01F0 ; PVALID # LATIN SMALL LETTER EZH WITH CARON..LATIN SMA === === 01F1..01F4 ; DISALLOWED # LATIN CAPITAL LETTER DZ..LATIN CAPITAL LETTE === == 01F5 ; PVALID # LATIN SMALL LETTER G WITH ACUTE == === 01F6..01F8 ; DISALLOWED # LATIN CAPITAL LETTER HWAIR..LATIN CAPITAL LE === == 01F9 ; PVALID # LATIN SMALL LETTER N WITH GRAVE == == 01FA ; DISALLOWED # LATIN CAPITAL LETTER A WITH RING ABOVE AND A == == 01FB ; PVALID # LATIN SMALL LETTER A WITH RING ABOVE AND ACU == == 01FC ; DISALLOWED # LATIN CAPITAL LETTER AE WITH ACUTE == == 01FD ; PVALID # LATIN SMALL LETTER AE WITH ACUTE == == 01FE ; DISALLOWED # LATIN CAPITAL LETTER O WITH STROKE AND ACUTE == == 01FF ; PVALID # LATIN SMALL LETTER O WITH STROKE AND ACUTE == == 0200 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOUBLE GRAVE == == 0201 ; PVALID # LATIN SMALL LETTER A WITH DOUBLE GRAVE == == 0202 ; DISALLOWED # LATIN CAPITAL LETTER A WITH INVERTED BREVE == == 0203 ; PVALID # LATIN SMALL LETTER A WITH INVERTED BREVE == == 0204 ; DISALLOWED # LATIN CAPITAL LETTER E WITH DOUBLE GRAVE == == 0205 ; PVALID # LATIN SMALL LETTER E WITH DOUBLE GRAVE == == 0206 ; DISALLOWED # LATIN CAPITAL LETTER E WITH INVERTED BREVE == == 0207 ; PVALID # LATIN SMALL LETTER E WITH INVERTED BREVE == == 0208 ; DISALLOWED # LATIN CAPITAL LETTER I WITH DOUBLE GRAVE == == 0209 ; PVALID # LATIN SMALL LETTER I WITH DOUBLE GRAVE == == 020A ; DISALLOWED # LATIN CAPITAL LETTER I WITH INVERTED BREVE == == 020B ; PVALID # LATIN SMALL LETTER I WITH INVERTED BREVE == == 020C ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOUBLE GRAVE == == 020D ; PVALID # LATIN SMALL LETTER O WITH DOUBLE GRAVE == == 020E ; DISALLOWED # LATIN CAPITAL LETTER O WITH INVERTED BREVE == == 020F ; PVALID # LATIN SMALL LETTER O WITH INVERTED BREVE == == 0210 ; DISALLOWED # LATIN CAPITAL LETTER R WITH DOUBLE GRAVE == == 0211 ; PVALID # LATIN SMALL LETTER R WITH DOUBLE GRAVE == == 0212 ; DISALLOWED # LATIN CAPITAL LETTER R WITH INVERTED BREVE == == 0213 ; PVALID # LATIN SMALL LETTER R WITH INVERTED BREVE == == 0214 ; DISALLOWED # LATIN CAPITAL LETTER U WITH DOUBLE GRAVE == == 0215 ; PVALID # LATIN SMALL LETTER U WITH DOUBLE GRAVE == == 0216 ; DISALLOWED # LATIN CAPITAL LETTER U WITH INVERTED BREVE == == 0217 ; PVALID # LATIN SMALL LETTER U WITH INVERTED BREVE == == 0218 ; DISALLOWED # LATIN CAPITAL LETTER S WITH COMMA BELOW == == 0219 ; PVALID # LATIN SMALL LETTER S WITH COMMA BELOW == == 021A ; DISALLOWED # LATIN CAPITAL LETTER T WITH COMMA BELOW == == 021B ; PVALID # LATIN SMALL LETTER T WITH COMMA BELOW == == 021C ; DISALLOWED # LATIN CAPITAL LETTER YOGH == == 021D ; PVALID # LATIN SMALL LETTER YOGH == == 021E ; DISALLOWED # LATIN CAPITAL LETTER H WITH CARON == == 021F ; PVALID # LATIN SMALL LETTER H WITH CARON == == 0220 ; DISALLOWED # LATIN CAPITAL LETTER N WITH LONG RIGHT LEG == == 0221 ; PVALID # LATIN SMALL LETTER D WITH CURL == == 0222 ; DISALLOWED # LATIN CAPITAL LETTER OU == == 0223 ; PVALID # LATIN SMALL LETTER OU == == 0224 ; DISALLOWED # LATIN CAPITAL LETTER Z WITH HOOK == == 0225 ; PVALID # LATIN SMALL LETTER Z WITH HOOK == == 0226 ; DISALLOWED # LATIN CAPITAL LETTER A WITH DOT ABOVE == == 0227 ; PVALID # LATIN SMALL LETTER A WITH DOT ABOVE == == 0228 ; DISALLOWED # LATIN CAPITAL LETTER E WITH CEDILLA == == 0229 ; PVALID # LATIN SMALL LETTER E WITH CEDILLA == == 022A ; DISALLOWED # LATIN CAPITAL LETTER O WITH DIAERESIS AND MA == == 022B ; PVALID # LATIN SMALL LETTER O WITH DIAERESIS AND MACR == == 022C ; DISALLOWED # LATIN CAPITAL LETTER O WITH TILDE AND MACRON == == 022D ; PVALID # LATIN SMALL LETTER O WITH TILDE AND MACRON == == 022E ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOT ABOVE == == 022F ; PVALID # LATIN SMALL LETTER O WITH DOT ABOVE == == 0230 ; DISALLOWED # LATIN CAPITAL LETTER O WITH DOT ABOVE AND MA == == 0231 ; PVALID # LATIN SMALL LETTER O WITH DOT ABOVE AND MACR == == 0232 ; DISALLOWED # LATIN CAPITAL LETTER Y WITH MACRON == === 0233..0239 ; PVALID # LATIN SMALL LETTER Y WITH MACRON..LATIN SMAL === === 023A..023B ; DISALLOWED # LATIN CAPITAL LETTER A WITH STROKE..LATIN CA === == 023C ; PVALID # LATIN SMALL LETTER C WITH STROKE == === 023D..023E ; DISALLOWED # LATIN CAPITAL LETTER L WITH BAR..LATIN CAPIT === === 023F..0240 ; PVALID # LATIN SMALL LETTER S WITH SWASH TAIL..LATIN === == 0241 ; DISALLOWED # LATIN CAPITAL LETTER GLOTTAL STOP == == 0242 ; PVALID # LATIN SMALL LETTER GLOTTAL STOP == === 0243..0246 ; DISALLOWED # LATIN CAPITAL LETTER B WITH STROKE..LATIN CA === == 0247 ; PVALID # LATIN SMALL LETTER E WITH STROKE == == 0248 ; DISALLOWED # LATIN CAPITAL LETTER J WITH STROKE == == 0249 ; PVALID # LATIN SMALL LETTER J WITH STROKE == == 024A ; DISALLOWED # LATIN CAPITAL LETTER SMALL Q WITH HOOK TAIL == == 024B ; PVALID # LATIN SMALL LETTER Q WITH HOOK TAIL == == 024C ; DISALLOWED # LATIN CAPITAL LETTER R WITH STROKE == == 024D ; PVALID # LATIN SMALL LETTER R WITH STROKE == == 024E ; DISALLOWED # LATIN CAPITAL LETTER Y WITH STROKE == === 024F..02AF ; PVALID # LATIN SMALL LETTER Y WITH STROKE..LATIN SMAL === === 02B0..02B8 ; DISALLOWED # MODIFIER LETTER SMALL H..MODIFIER LETTER SMA === === 02B9..02C1 ; PVALID # MODIFIER LETTER PRIME..MODIFIER LETTER REVER === === 02C2..02C5 ; DISALLOWED # MODIFIER LETTER LEFT ARROWHEAD..MODIFIER LET === === 02C6..02D1 ; PVALID # MODIFIER LETTER CIRCUMFLEX ACCENT..MODIFIER === === 02D2..02EB ; DISALLOWED # MODIFIER LETTER CENTRED RIGHT HALF RING..MOD === == 02EC ; PVALID # MODIFIER LETTER VOICING == == 02ED ; DISALLOWED # MODIFIER LETTER UNASPIRATED == == 02EE ; PVALID # MODIFIER LETTER DOUBLE APOSTROPHE == === 02EF..02FF ; DISALLOWED # MODIFIER LETTER LOW DOWN ARROWHEAD..MODIFIER === === 0300..033F ; PVALID # COMBINING GRAVE ACCENT..COMBINING DOUBLE OVE === === 0340..0341 ; DISALLOWED # COMBINING GRAVE TONE MARK..COMBINING ACUTE T === == 0342 ; PVALID # COMBINING GREEK PERISPOMENI == === 0343..0345 ; DISALLOWED # COMBINING GREEK KORONIS..COMBINING GREEK YPO === === 0346..034E ; PVALID # COMBINING BRIDGE ABOVE..COMBINING UPWARDS AR === == 034F ; DISALLOWED # COMBINING GRAPHEME JOINER == === 0350..036F ; PVALID # COMBINING RIGHT ARROWHEAD ABOVE..COMBINING L === == 0370 ; DISALLOWED # GREEK CAPITAL LETTER HETA == == 0371 ; PVALID # GREEK SMALL LETTER HETA == == 0372 ; DISALLOWED # GREEK CAPITAL LETTER ARCHAIC SAMPI == == 0373 ; PVALID # GREEK SMALL LETTER ARCHAIC SAMPI == == 0374 ; DISALLOWED # GREEK NUMERAL SIGN == == 0375 ; CONTEXTO # GREEK LOWER NUMERAL SIGN == == 0376 ; DISALLOWED # GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA == == 0377 ; PVALID # GREEK SMALL LETTER PAMPHYLIAN DIGAMMA == === 0378..0379 ; UNASSIGNED # <reserved>..<reserved> === == 037A ; DISALLOWED # GREEK YPOGEGRAMMENI == === 037B..037D ; PVALID # GREEK SMALL REVERSED LUNATE SIGMA SYMBOL..GR === == 037E ; DISALLOWED # GREEK QUESTION MARK == === 037F..0383 ; UNASSIGNED # <reserved>..<reserved> === === 0384..038A ; DISALLOWED # GREEK TONOS..GREEK CAPITAL LETTER IOTA WITH === == 038B ; UNASSIGNED # <reserved> == == 038C ; DISALLOWED # GREEK CAPITAL LETTER OMICRON WITH TONOS == == 038D ; UNASSIGNED # <reserved> == === 038E..038F ; DISALLOWED # GREEK CAPITAL LETTER UPSILON WITH TONOS..GRE === == 0390 ; PVALID # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND T == === 0391..03A1 ; DISALLOWED # GREEK CAPITAL LETTER ALPHA..GREEK CAPITAL LE === == 03A2 ; UNASSIGNED # <reserved> == === 03A3..03AB ; DISALLOWED # GREEK CAPITAL LETTER SIGMA..GREEK CAPITAL LE === === 03AC..03CE ; PVALID # GREEK SMALL LETTER ALPHA WITH TONOS..GREEK S === === 03CF..03D6 ; DISALLOWED # GREEK CAPITAL KAI SYMBOL..GREEK PI SYMBOL === == 03D7 ; PVALID # GREEK KAI SYMBOL == == 03D8 ; DISALLOWED # GREEK LETTER ARCHAIC KOPPA == == 03D9 ; PVALID # GREEK SMALL LETTER ARCHAIC KOPPA == == 03DA ; DISALLOWED # GREEK LETTER STIGMA == == 03DB ; PVALID # GREEK SMALL LETTER STIGMA == == 03DC ; DISALLOWED # GREEK LETTER DIGAMMA == == 03DD ; PVALID # GREEK SMALL LETTER DIGAMMA == == 03DE ; DISALLOWED # GREEK LETTER KOPPA == == 03DF ; PVALID # GREEK SMALL LETTER KOPPA == == 03E0 ; DISALLOWED # GREEK LETTER SAMPI == == 03E1 ; PVALID # GREEK SMALL LETTER SAMPI == == 03E2 ; DISALLOWED # COPTIC CAPITAL LETTER SHEI == == 03E3 ; PVALID # COPTIC SMALL LETTER SHEI == == 03E4 ; DISALLOWED # COPTIC CAPITAL LETTER FEI == == 03E5 ; PVALID # COPTIC SMALL LETTER FEI == == 03E6 ; DISALLOWED # COPTIC CAPITAL LETTER KHEI == == 03E7 ; PVALID # COPTIC SMALL LETTER KHEI == == 03E8 ; DISALLOWED # COPTIC CAPITAL LETTER HORI == == 03E9 ; PVALID # COPTIC SMALL LETTER HORI == == 03EA ; DISALLOWED # COPTIC CAPITAL LETTER GANGIA == == 03EB ; PVALID # COPTIC SMALL LETTER GANGIA == == 03EC ; DISALLOWED # COPTIC CAPITAL LETTER SHIMA == == 03ED ; PVALID # COPTIC SMALL LETTER SHIMA == == 03EE ; DISALLOWED # COPTIC CAPITAL LETTER DEI == == 03EF ; PVALID # COPTIC SMALL LETTER DEI == === 03F0..03F2 ; DISALLOWED # GREEK KAPPA SYMBOL..GREEK LUNATE SIGMA SYMBO === == 03F3 ; PVALID # GREEK LETTER YOT == === 03F4..03F7 ; DISALLOWED # GREEK CAPITAL THETA SYMBOL..GREEK CAPITAL LE === == 03F8 ; PVALID # GREEK SMALL LETTER SHO == === 03F9..03FA ; DISALLOWED # GREEK CAPITAL LUNATE SIGMA SYMBOL..GREEK CAP === === 03FB..03FC ; PVALID # GREEK SMALL LETTER SAN..GREEK RHO WITH STROK === === 03FD..042F ; DISALLOWED # GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL.. === === 0430..045F ; PVALID # CYRILLIC SMALL LETTER A..CYRILLIC SMALL LETT === == 0460 ; DISALLOWED # CYRILLIC CAPITAL LETTER OMEGA == == 0461 ; PVALID # CYRILLIC SMALL LETTER OMEGA == == 0462 ; DISALLOWED # CYRILLIC CAPITAL LETTER YAT == == 0463 ; PVALID # CYRILLIC SMALL LETTER YAT == == 0464 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED E == == 0465 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED E == == 0466 ; DISALLOWED # CYRILLIC CAPITAL LETTER LITTLE YUS == == 0467 ; PVALID # CYRILLIC SMALL LETTER LITTLE YUS == == 0468 ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS == == 0469 ; PVALID # CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS == == 046A ; DISALLOWED # CYRILLIC CAPITAL LETTER BIG YUS == == 046B ; PVALID # CYRILLIC SMALL LETTER BIG YUS == == 046C ; DISALLOWED # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS == == 046D ; PVALID # CYRILLIC SMALL LETTER IOTIFIED BIG YUS == == 046E ; DISALLOWED # CYRILLIC CAPITAL LETTER KSI == == 046F ; PVALID # CYRILLIC SMALL LETTER KSI == == 0470 ; DISALLOWED # CYRILLIC CAPITAL LETTER PSI == == 0471 ; PVALID # CYRILLIC SMALL LETTER PSI == == 0472 ; DISALLOWED # CYRILLIC CAPITAL LETTER FITA == == 0473 ; PVALID # CYRILLIC SMALL LETTER FITA == == 0474 ; DISALLOWED # CYRILLIC CAPITAL LETTER IZHITSA == == 0475 ; PVALID # CYRILLIC SMALL LETTER IZHITSA == == 0476 ; DISALLOWED # CYRILLIC CAPITAL LETTER IZHITSA WITH DOUBLE == == 0477 ; PVALID # CYRILLIC SMALL LETTER IZHITSA WITH DOUBLE GR == == 0478 ; DISALLOWED # CYRILLIC CAPITAL LETTER UK == == 0479 ; PVALID # CYRILLIC SMALL LETTER UK == == 047A ; DISALLOWED # CYRILLIC CAPITAL LETTER ROUND OMEGA == == 047B ; PVALID # CYRILLIC SMALL LETTER ROUND OMEGA == == 047C ; DISALLOWED # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO == == 047D ; PVALID # CYRILLIC SMALL LETTER OMEGA WITH TITLO == == 047E ; DISALLOWED # CYRILLIC CAPITAL LETTER OT == == 047F ; PVALID # CYRILLIC SMALL LETTER OT == == 0480 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOPPA == == 0481 ; PVALID # CYRILLIC SMALL LETTER KOPPA == == 0482 ; DISALLOWED # CYRILLIC THOUSANDS SIGN == === 0483..0487 ; PVALID # COMBINING CYRILLIC TITLO..COMBINING CYRILLIC === === 0488..048A ; DISALLOWED # COMBINING CYRILLIC HUNDRED THOUSANDS SIGN..C === == 048B ; PVALID # CYRILLIC SMALL LETTER SHORT I WITH TAIL == == 048C ; DISALLOWED # CYRILLIC CAPITAL LETTER SEMISOFT SIGN == == 048D ; PVALID # CYRILLIC SMALL LETTER SEMISOFT SIGN == == 048E ; DISALLOWED # CYRILLIC CAPITAL LETTER ER WITH TICK == == 048F ; PVALID # CYRILLIC SMALL LETTER ER WITH TICK == == 0490 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH UPTURN == == 0491 ; PVALID # CYRILLIC SMALL LETTER GHE WITH UPTURN == == 0492 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH STROKE == == 0493 ; PVALID # CYRILLIC SMALL LETTER GHE WITH STROKE == == 0494 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK == == 0495 ; PVALID # CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK == == 0496 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER == == 0497 ; PVALID # CYRILLIC SMALL LETTER ZHE WITH DESCENDER == == 0498 ; DISALLOWED # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER == == 0499 ; PVALID # CYRILLIC SMALL LETTER ZE WITH DESCENDER == == 049A ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH DESCENDER == == 049B ; PVALID # CYRILLIC SMALL LETTER KA WITH DESCENDER == == 049C ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STR == == 049D ; PVALID # CYRILLIC SMALL LETTER KA WITH VERTICAL STROK == == 049E ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH STROKE == == 049F ; PVALID # CYRILLIC SMALL LETTER KA WITH STROKE == == 04A0 ; DISALLOWED # CYRILLIC CAPITAL LETTER BASHKIR KA == == 04A1 ; PVALID # CYRILLIC SMALL LETTER BASHKIR KA == == 04A2 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH DESCENDER == == 04A3 ; PVALID # CYRILLIC SMALL LETTER EN WITH DESCENDER == == 04A4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE EN GHE == == 04A5 ; PVALID # CYRILLIC SMALL LIGATURE EN GHE == == 04A6 ; DISALLOWED # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK == == 04A7 ; PVALID # CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK == == 04A8 ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN HA == == 04A9 ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN HA == == 04AA ; DISALLOWED # CYRILLIC CAPITAL LETTER ES WITH DESCENDER == == 04AB ; PVALID # CYRILLIC SMALL LETTER ES WITH DESCENDER == == 04AC ; DISALLOWED # CYRILLIC CAPITAL LETTER TE WITH DESCENDER == == 04AD ; PVALID # CYRILLIC SMALL LETTER TE WITH DESCENDER == == 04AE ; DISALLOWED # CYRILLIC CAPITAL LETTER STRAIGHT U == == 04AF ; PVALID # CYRILLIC SMALL LETTER STRAIGHT U == == 04B0 ; DISALLOWED # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STRO == == 04B1 ; PVALID # CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE == == 04B2 ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH DESCENDER == == 04B3 ; PVALID # CYRILLIC SMALL LETTER HA WITH DESCENDER == == 04B4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE TE TSE == == 04B5 ; PVALID # CYRILLIC SMALL LIGATURE TE TSE == == 04B6 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER == == 04B7 ; PVALID # CYRILLIC SMALL LETTER CHE WITH DESCENDER == == 04B8 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL ST == == 04B9 ; PVALID # CYRILLIC SMALL LETTER CHE WITH VERTICAL STRO == == 04BA ; DISALLOWED # CYRILLIC CAPITAL LETTER SHHA == == 04BB ; PVALID # CYRILLIC SMALL LETTER SHHA == == 04BC ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN CHE == == 04BD ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN CHE == == 04BE ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH D == == 04BF ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DES == === 04C0..04C1 ; DISALLOWED # CYRILLIC LETTER PALOCHKA..CYRILLIC CAPITAL L === == 04C2 ; PVALID # CYRILLIC SMALL LETTER ZHE WITH BREVE == == 04C3 ; DISALLOWED # CYRILLIC CAPITAL LETTER KA WITH HOOK == == 04C4 ; PVALID # CYRILLIC SMALL LETTER KA WITH HOOK == == 04C5 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH TAIL == == 04C6 ; PVALID # CYRILLIC SMALL LETTER EL WITH TAIL == == 04C7 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH HOOK == == 04C8 ; PVALID # CYRILLIC SMALL LETTER EN WITH HOOK == == 04C9 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH TAIL == == 04CA ; PVALID # CYRILLIC SMALL LETTER EN WITH TAIL == == 04CB ; DISALLOWED # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE == == 04CC ; PVALID # CYRILLIC SMALL LETTER KHAKASSIAN CHE == == 04CD ; DISALLOWED # CYRILLIC CAPITAL LETTER EM WITH TAIL == === 04CE..04CF ; PVALID # CYRILLIC SMALL LETTER EM WITH TAIL..CYRILLIC === == 04D0 ; DISALLOWED # CYRILLIC CAPITAL LETTER A WITH BREVE == == 04D1 ; PVALID # CYRILLIC SMALL LETTER A WITH BREVE == == 04D2 ; DISALLOWED # CYRILLIC CAPITAL LETTER A WITH DIAERESIS == == 04D3 ; PVALID # CYRILLIC SMALL LETTER A WITH DIAERESIS == == 04D4 ; DISALLOWED # CYRILLIC CAPITAL LIGATURE A IE == == 04D5 ; PVALID # CYRILLIC SMALL LIGATURE A IE == == 04D6 ; DISALLOWED # CYRILLIC CAPITAL LETTER IE WITH BREVE == == 04D7 ; PVALID # CYRILLIC SMALL LETTER IE WITH BREVE == == 04D8 ; DISALLOWED # CYRILLIC CAPITAL LETTER SCHWA == == 04D9 ; PVALID # CYRILLIC SMALL LETTER SCHWA == == 04DA ; DISALLOWED # CYRILLIC CAPITAL LETTER SCHWA WITH DIAERESIS == == 04DB ; PVALID # CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS == == 04DC ; DISALLOWED # CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS == == 04DD ; PVALID # CYRILLIC SMALL LETTER ZHE WITH DIAERESIS == == 04DE ; DISALLOWED # CYRILLIC CAPITAL LETTER ZE WITH DIAERESIS == == 04DF ; PVALID # CYRILLIC SMALL LETTER ZE WITH DIAERESIS == == 04E0 ; DISALLOWED # CYRILLIC CAPITAL LETTER ABKHASIAN DZE == == 04E1 ; PVALID # CYRILLIC SMALL LETTER ABKHASIAN DZE == == 04E2 ; DISALLOWED # CYRILLIC CAPITAL LETTER I WITH MACRON == == 04E3 ; PVALID # CYRILLIC SMALL LETTER I WITH MACRON == == 04E4 ; DISALLOWED # CYRILLIC CAPITAL LETTER I WITH DIAERESIS == == 04E5 ; PVALID # CYRILLIC SMALL LETTER I WITH DIAERESIS == == 04E6 ; DISALLOWED # CYRILLIC CAPITAL LETTER O WITH DIAERESIS == == 04E7 ; PVALID # CYRILLIC SMALL LETTER O WITH DIAERESIS == == 04E8 ; DISALLOWED # CYRILLIC CAPITAL LETTER BARRED O == == 04E9 ; PVALID # CYRILLIC SMALL LETTER BARRED O == == 04EA ; DISALLOWED # CYRILLIC CAPITAL LETTER BARRED O WITH DIAERE == == 04EB ; PVALID # CYRILLIC SMALL LETTER BARRED O WITH DIAERESI == == 04EC ; DISALLOWED # CYRILLIC CAPITAL LETTER E WITH DIAERESIS == == 04ED ; PVALID # CYRILLIC SMALL LETTER E WITH DIAERESIS == == 04EE ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH MACRON == == 04EF ; PVALID # CYRILLIC SMALL LETTER U WITH MACRON == == 04F0 ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH DIAERESIS == == 04F1 ; PVALID # CYRILLIC SMALL LETTER U WITH DIAERESIS == == 04F2 ; DISALLOWED # CYRILLIC CAPITAL LETTER U WITH DOUBLE ACUTE == == 04F3 ; PVALID # CYRILLIC SMALL LETTER U WITH DOUBLE ACUTE == == 04F4 ; DISALLOWED # CYRILLIC CAPITAL LETTER CHE WITH DIAERESIS == == 04F5 ; PVALID # CYRILLIC SMALL LETTER CHE WITH DIAERESIS == == 04F6 ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH DESCENDER == == 04F7 ; PVALID # CYRILLIC SMALL LETTER GHE WITH DESCENDER == == 04F8 ; DISALLOWED # CYRILLIC CAPITAL LETTER YERU WITH DIAERESIS == == 04F9 ; PVALID # CYRILLIC SMALL LETTER YERU WITH DIAERESIS == == 04FA ; DISALLOWED # CYRILLIC CAPITAL LETTER GHE WITH STROKE AND == == 04FB ; PVALID # CYRILLIC SMALL LETTER GHE WITH STROKE AND HO == == 04FC ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH HOOK == == 04FD ; PVALID # CYRILLIC SMALL LETTER HA WITH HOOK == == 04FE ; DISALLOWED # CYRILLIC CAPITAL LETTER HA WITH STROKE == == 04FF ; PVALID # CYRILLIC SMALL LETTER HA WITH STROKE == == 0500 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DE == == 0501 ; PVALID # CYRILLIC SMALL LETTER KOMI DE == == 0502 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DJE == == 0503 ; PVALID # CYRILLIC SMALL LETTER KOMI DJE == == 0504 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI ZJE == == 0505 ; PVALID # CYRILLIC SMALL LETTER KOMI ZJE == == 0506 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI DZJE == == 0507 ; PVALID # CYRILLIC SMALL LETTER KOMI DZJE == == 0508 ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI LJE == == 0509 ; PVALID # CYRILLIC SMALL LETTER KOMI LJE == == 050A ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI NJE == == 050B ; PVALID # CYRILLIC SMALL LETTER KOMI NJE == == 050C ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI SJE == == 050D ; PVALID # CYRILLIC SMALL LETTER KOMI SJE == == 050E ; DISALLOWED # CYRILLIC CAPITAL LETTER KOMI TJE == == 050F ; PVALID # CYRILLIC SMALL LETTER KOMI TJE == == 0510 ; DISALLOWED # CYRILLIC CAPITAL LETTER REVERSED ZE == == 0511 ; PVALID # CYRILLIC SMALL LETTER REVERSED ZE == == 0512 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH HOOK == == 0513 ; PVALID # CYRILLIC SMALL LETTER EL WITH HOOK == == 0514 ; DISALLOWED # CYRILLIC CAPITAL LETTER LHA == == 0515 ; PVALID # CYRILLIC SMALL LETTER LHA == == 0516 ; DISALLOWED # CYRILLIC CAPITAL LETTER RHA == == 0517 ; PVALID # CYRILLIC SMALL LETTER RHA == == 0518 ; DISALLOWED # CYRILLIC CAPITAL LETTER YAE == == 0519 ; PVALID # CYRILLIC SMALL LETTER YAE == == 051A ; DISALLOWED # CYRILLIC CAPITAL LETTER QA == == 051B ; PVALID # CYRILLIC SMALL LETTER QA == == 051C ; DISALLOWED # CYRILLIC CAPITAL LETTER WE == == 051D ; PVALID # CYRILLIC SMALL LETTER WE == == 051E ; DISALLOWED # CYRILLIC CAPITAL LETTER ALEUT KA == == 051F ; PVALID # CYRILLIC SMALL LETTER ALEUT KA == == 0520 ; DISALLOWED # CYRILLIC CAPITAL LETTER EL WITH MIDDLE HOOK == == 0521 ; PVALID # CYRILLIC SMALL LETTER EL WITH MIDDLE HOOK == == 0522 ; DISALLOWED # CYRILLIC CAPITAL LETTER EN WITH MIDDLE HOOK == == 0523 ; PVALID # CYRILLIC SMALL LETTER EN WITH MIDDLE HOOK == === 0524..0530 ; UNASSIGNED # <reserved>..<reserved> === === 0531..0556 ; DISALLOWED # ARMENIAN CAPITAL LETTER AYB..ARMENIAN CAPITA === === 0557..0558 ; UNASSIGNED # <reserved>..<reserved> === == 0559 ; PVALID # ARMENIAN MODIFIER LETTER LEFT HALF RING == === 055A..055F ; DISALLOWED # ARMENIAN APOSTROPHE..ARMENIAN ABBREVIATION M === == 0560 ; UNASSIGNED # <reserved> == === 0561..0586 ; PVALID # ARMENIAN SMALL LETTER AYB..ARMENIAN SMALL LE === == 0587 ; DISALLOWED # ARMENIAN SMALL LIGATURE ECH YIWN == == 0588 ; UNASSIGNED # <reserved> == === 0589..058A ; DISALLOWED # ARMENIAN FULL STOP..ARMENIAN HYPHEN === === 058B..0590 ; UNASSIGNED # <reserved>..<reserved> === === 0591..05BD ; PVALID # HEBREW ACCENT ETNAHTA..HEBREW POINT METEG === == 05BE ; DISALLOWED # HEBREW PUNCTUATION MAQAF == == 05BF ; PVALID # HEBREW POINT RAFE == == 05C0 ; DISALLOWED # HEBREW PUNCTUATION PASEQ == === 05C1..05C2 ; PVALID # HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT === == 05C3 ; DISALLOWED # HEBREW PUNCTUATION SOF PASUQ == === 05C4..05C5 ; PVALID # HEBREW MARK UPPER DOT..HEBREW MARK LOWER DOT === == 05C6 ; DISALLOWED # HEBREW PUNCTUATION NUN HAFUKHA == == 05C7 ; PVALID # HEBREW POINT QAMATS QATAN == === 05C8..05CF ; UNASSIGNED # <reserved>..<reserved> === === 05D0..05EA ; PVALID # HEBREW LETTER ALEF..HEBREW LETTER TAV === === 05EB..05EF ; UNASSIGNED # <reserved>..<reserved> === === 05F0..05F2 ; PVALID # HEBREW LIGATURE YIDDISH DOUBLE VAV..HEBREW L === === 05F3..05F4 ; CONTEXTO # HEBREW PUNCTUATION GERESH..HEBREW PUNCTUATIO === === 05F5..05FF ; UNASSIGNED # <reserved>..<reserved> === === 0600..0603 ; DISALLOWED # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA === === 0604..0605 ; UNASSIGNED # <reserved>..<reserved> === === 0606..060F ; DISALLOWED # ARABIC-INDIC CUBE ROOT..ARABIC SIGN MISRA === === 0610..061A ; PVALID # ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..AR === == 061B ; DISALLOWED # ARABIC SEMICOLON == === 061C..061D ; UNASSIGNED # <reserved>..<reserved> === === 061E..061F ; DISALLOWED # ARABIC TRIPLE DOT PUNCTUATION MARK..ARABIC Q === == 0620 ; UNASSIGNED # <reserved> == === 0621..063F ; PVALID # ARABIC LETTER HAMZA..ARABIC LETTER FARSI YEH === == 0640 ; DISALLOWED # ARABIC TATWEEL == === 0641..065E ; PVALID # ARABIC LETTER FEH..ARABIC FATHA WITH TWO DOT === == 065F ; UNASSIGNED # <reserved> == === 0660..0669 ; CONTEXTO # ARABIC-INDIC DIGIT ZERO..ARABIC-INDIC DIGIT === === 066A..066D ; DISALLOWED # ARABIC PERCENT SIGN..ARABIC FIVE POINTED STA === === 066E..0674 ; PVALID # ARABIC LETTER DOTLESS BEH..ARABIC LETTER HIG === === 0675..0678 ; DISALLOWED # ARABIC LETTER HIGH HAMZA ALEF..ARABIC LETTER === === 0679..06D3 ; PVALID # ARABIC LETTER TTEH..ARABIC LETTER YEH BARREE === == 06D4 ; DISALLOWED # ARABIC FULL STOP == === 06D5..06DC ; PVALID # ARABIC LETTER AE..ARABIC SMALL HIGH SEEN === === 06DD..06DE ; DISALLOWED # ARABIC END OF AYAH..ARABIC START OF RUB EL H === === 06DF..06E8 ; PVALID # ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL === == 06E9 ; DISALLOWED # ARABIC PLACE OF SAJDAH == === 06EA..06EF ; PVALID # ARABIC EMPTY CENTRE LOW STOP..ARABIC LETTER === === 06F0..06F9 ; CONTEXTO # EXTENDED ARABIC-INDIC DIGIT ZERO..EXTENDED A === === 06FA..06FF ; PVALID # ARABIC LETTER SHEEN WITH DOT BELOW..ARABIC L === === 0700..070D ; DISALLOWED # SYRIAC END OF PARAGRAPH..SYRIAC HARKLEAN AST === == 070E ; UNASSIGNED # <reserved> == == 070F ; DISALLOWED # SYRIAC ABBREVIATION MARK == === 0710..074A ; PVALID # SYRIAC LETTER ALAPH..SYRIAC BARREKH === === 074B..074C ; UNASSIGNED # <reserved>..<reserved> === === 074D..07B1 ; PVALID # SYRIAC LETTER SOGDIAN ZHAIN..THAANA LETTER N === === 07B2..07BF ; UNASSIGNED # <reserved>..<reserved> === === 07C0..07F5 ; PVALID # NKO DIGIT ZERO..NKO LOW TONE APOSTROPHE === === 07F6..07FA ; DISALLOWED # NKO SYMBOL OO DENNEN..NKO LAJANYALAN === === 07FB..0900 ; UNASSIGNED # <reserved>..<reserved> === === 0901..0939 ; PVALID # DEVANAGARI SIGN CANDRABINDU..DEVANAGARI LETT === === 093A..093B ; UNASSIGNED # <reserved>..<reserved> === === 093C..094D ; PVALID # DEVANAGARI SIGN NUKTA..DEVANAGARI SIGN VIRAM === === 094E..094F ; UNASSIGNED # <reserved>..<reserved> === === 0950..0954 ; PVALID # DEVANAGARI OM..DEVANAGARI ACUTE ACCENT === === 0955..0957 ; UNASSIGNED # <reserved>..<reserved> === === 0958..095F ; DISALLOWED # DEVANAGARI LETTER QA..DEVANAGARI LETTER YYA === === 0960..0963 ; PVALID # DEVANAGARI LETTER VOCALIC RR..DEVANAGARI VOW === === 0964..0965 ; DISALLOWED # DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA === === 0966..096F ; PVALID # DEVANAGARI DIGIT ZERO..DEVANAGARI DIGIT NINE === == 0970 ; DISALLOWED # DEVANAGARI ABBREVIATION SIGN == === 0971..0972 ; PVALID # DEVANAGARI SIGN HIGH SPACING DOT..DEVANAGARI === === 0973..097A ; UNASSIGNED # <reserved>..<reserved> === === 097B..097F ; PVALID # DEVANAGARI LETTER GGA..DEVANAGARI LETTER BBA === == 0980 ; UNASSIGNED # <reserved> == === 0981..0983 ; PVALID # BENGALI SIGN CANDRABINDU..BENGALI SIGN VISAR === == 0984 ; UNASSIGNED # <reserved> == === 0985..098C ; PVALID # BENGALI LETTER A..BENGALI LETTER VOCALIC L === === 098D..098E ; UNASSIGNED # <reserved>..<reserved> === === 098F..0990 ; PVALID # BENGALI LETTER E..BENGALI LETTER AI === === 0991..0992 ; UNASSIGNED # <reserved>..<reserved> === === 0993..09A8 ; PVALID # BENGALI LETTER O..BENGALI LETTER NA === == 09A9 ; UNASSIGNED # <reserved> == === 09AA..09B0 ; PVALID # BENGALI LETTER PA..BENGALI LETTER RA === == 09B1 ; UNASSIGNED # <reserved> == == 09B2 ; PVALID # BENGALI LETTER LA == === 09B3..09B5 ; UNASSIGNED # <reserved>..<reserved> === === 09B6..09B9 ; PVALID # BENGALI LETTER SHA..BENGALI LETTER HA === === 09BA..09BB ; UNASSIGNED # <reserved>..<reserved> === === 09BC..09C4 ; PVALID # BENGALI SIGN NUKTA..BENGALI VOWEL SIGN VOCAL === === 09C5..09C6 ; UNASSIGNED # <reserved>..<reserved> === === 09C7..09C8 ; PVALID # BENGALI VOWEL SIGN E..BENGALI VOWEL SIGN AI === === 09C9..09CA ; UNASSIGNED # <reserved>..<reserved> === === 09CB..09CE ; PVALID # BENGALI VOWEL SIGN O..BENGALI LETTER KHANDA === === 09CF..09D6 ; UNASSIGNED # <reserved>..<reserved> === == 09D7 ; PVALID # BENGALI AU LENGTH MARK == === 09D8..09DB ; UNASSIGNED # <reserved>..<reserved> === === 09DC..09DD ; DISALLOWED # BENGALI LETTER RRA..BENGALI LETTER RHA === == 09DE ; UNASSIGNED # <reserved> == == 09DF ; DISALLOWED # BENGALI LETTER YYA == === 09E0..09E3 ; PVALID # BENGALI LETTER VOCALIC RR..BENGALI VOWEL SIG === === 09E4..09E5 ; UNASSIGNED # <reserved>..<reserved> === === 09E6..09F1 ; PVALID # BENGALI DIGIT ZERO..BENGALI LETTER RA WITH L === === 09F2..09FA ; DISALLOWED # BENGALI RUPEE MARK..BENGALI ISSHAR === === 09FB..0A00 ; UNASSIGNED # <reserved>..<reserved> === === 0A01..0A03 ; PVALID # GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN VISA === == 0A04 ; UNASSIGNED # <reserved> == === 0A05..0A0A ; PVALID # GURMUKHI LETTER A..GURMUKHI LETTER UU === === 0A0B..0A0E ; UNASSIGNED # <reserved>..<reserved> === === 0A0F..0A10 ; PVALID # GURMUKHI LETTER EE..GURMUKHI LETTER AI === === 0A11..0A12 ; UNASSIGNED # <reserved>..<reserved> === === 0A13..0A28 ; PVALID # GURMUKHI LETTER OO..GURMUKHI LETTER NA === == 0A29 ; UNASSIGNED # <reserved> == === 0A2A..0A30 ; PVALID # GURMUKHI LETTER PA..GURMUKHI LETTER RA === == 0A31 ; UNASSIGNED # <reserved> == == 0A32 ; PVALID # GURMUKHI LETTER LA == == 0A33 ; DISALLOWED # GURMUKHI LETTER LLA == == 0A34 ; UNASSIGNED # <reserved> == == 0A35 ; PVALID # GURMUKHI LETTER VA == == 0A36 ; DISALLOWED # GURMUKHI LETTER SHA == == 0A37 ; UNASSIGNED # <reserved> == === 0A38..0A39 ; PVALID # GURMUKHI LETTER SA..GURMUKHI LETTER HA === === 0A3A..0A3B ; UNASSIGNED # <reserved>..<reserved> === == 0A3C ; PVALID # GURMUKHI SIGN NUKTA == == 0A3D ; UNASSIGNED # <reserved> == === 0A3E..0A42 ; PVALID # GURMUKHI VOWEL SIGN AA..GURMUKHI VOWEL SIGN === === 0A43..0A46 ; UNASSIGNED # <reserved>..<reserved> === === 0A47..0A48 ; PVALID # GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN === === 0A49..0A4A ; UNASSIGNED # <reserved>..<reserved> === === 0A4B..0A4D ; PVALID # GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA === === 0A4E..0A50 ; UNASSIGNED # <reserved>..<reserved> === == 0A51 ; PVALID # GURMUKHI SIGN UDAAT == === 0A52..0A58 ; UNASSIGNED # <reserved>..<reserved> === === 0A59..0A5B ; DISALLOWED # GURMUKHI LETTER KHHA..GURMUKHI LETTER ZA === == 0A5C ; PVALID # GURMUKHI LETTER RRA == == 0A5D ; UNASSIGNED # <reserved> == == 0A5E ; DISALLOWED # GURMUKHI LETTER FA == === 0A5F..0A65 ; UNASSIGNED # <reserved>..<reserved> === === 0A66..0A75 ; PVALID # GURMUKHI DIGIT ZERO..GURMUKHI SIGN YAKASH === === 0A76..0A80 ; UNASSIGNED # <reserved>..<reserved> === === 0A81..0A83 ; PVALID # GUJARATI SIGN CANDRABINDU..GUJARATI SIGN VIS === == 0A84 ; UNASSIGNED # <reserved> == === 0A85..0A8D ; PVALID # GUJARATI LETTER A..GUJARATI VOWEL CANDRA E === == 0A8E ; UNASSIGNED # <reserved> == === 0A8F..0A91 ; PVALID # GUJARATI LETTER E..GUJARATI VOWEL CANDRA O === == 0A92 ; UNASSIGNED # <reserved> == === 0A93..0AA8 ; PVALID # GUJARATI LETTER O..GUJARATI LETTER NA === == 0AA9 ; UNASSIGNED # <reserved> == === 0AAA..0AB0 ; PVALID # GUJARATI LETTER PA..GUJARATI LETTER RA === == 0AB1 ; UNASSIGNED # <reserved> == === 0AB2..0AB3 ; PVALID # GUJARATI LETTER LA..GUJARATI LETTER LLA === == 0AB4 ; UNASSIGNED # <reserved> == === 0AB5..0AB9 ; PVALID # GUJARATI LETTER VA..GUJARATI LETTER HA === === 0ABA..0ABB ; UNASSIGNED # <reserved>..<reserved> === === 0ABC..0AC5 ; PVALID # GUJARATI SIGN NUKTA..GUJARATI VOWEL SIGN CAN === == 0AC6 ; UNASSIGNED # <reserved> == === 0AC7..0AC9 ; PVALID # GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN C === == 0ACA ; UNASSIGNED # <reserved> == === 0ACB..0ACD ; PVALID # GUJARATI VOWEL SIGN O..GUJARATI SIGN VIRAMA === === 0ACE..0ACF ; UNASSIGNED # <reserved>..<reserved> === == 0AD0 ; PVALID # GUJARATI OM == === 0AD1..0ADF ; UNASSIGNED # <reserved>..<reserved> === === 0AE0..0AE3 ; PVALID # GUJARATI LETTER VOCALIC RR..GUJARATI VOWEL S === === 0AE4..0AE5 ; UNASSIGNED # <reserved>..<reserved> === === 0AE6..0AEF ; PVALID # GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE === == 0AF0 ; UNASSIGNED # <reserved> == == 0AF1 ; DISALLOWED # GUJARATI RUPEE SIGN == === 0AF2..0B00 ; UNASSIGNED # <reserved>..<reserved> === === 0B01..0B03 ; PVALID # ORIYA SIGN CANDRABINDU..ORIYA SIGN VISARGA === == 0B04 ; UNASSIGNED # <reserved> == === 0B05..0B0C ; PVALID # ORIYA LETTER A..ORIYA LETTER VOCALIC L === === 0B0D..0B0E ; UNASSIGNED # <reserved>..<reserved> === === 0B0F..0B10 ; PVALID # ORIYA LETTER E..ORIYA LETTER AI === === 0B11..0B12 ; UNASSIGNED # <reserved>..<reserved> === === 0B13..0B28 ; PVALID # ORIYA LETTER O..ORIYA LETTER NA === == 0B29 ; UNASSIGNED # <reserved> == === 0B2A..0B30 ; PVALID # ORIYA LETTER PA..ORIYA LETTER RA === == 0B31 ; UNASSIGNED # <reserved> == === 0B32..0B33 ; PVALID # ORIYA LETTER LA..ORIYA LETTER LLA === == 0B34 ; UNASSIGNED # <reserved> == === 0B35..0B39 ; PVALID # ORIYA LETTER VA..ORIYA LETTER HA === === 0B3A..0B3B ; UNASSIGNED # <reserved>..<reserved> === === 0B3C..0B44 ; PVALID # ORIYA SIGN NUKTA..ORIYA VOWEL SIGN VOCALIC R === === 0B45..0B46 ; UNASSIGNED # <reserved>..<reserved> === === 0B47..0B48 ; PVALID # ORIYA VOWEL SIGN E..ORIYA VOWEL SIGN AI === === 0B49..0B4A ; UNASSIGNED # <reserved>..<reserved> === === 0B4B..0B4D ; PVALID # ORIYA VOWEL SIGN O..ORIYA SIGN VIRAMA === === 0B4E..0B55 ; UNASSIGNED # <reserved>..<reserved> === === 0B56..0B57 ; PVALID # ORIYA AI LENGTH MARK..ORIYA AU LENGTH MARK === === 0B58..0B5B ; UNASSIGNED # <reserved>..<reserved> === === 0B5C..0B5D ; DISALLOWED # ORIYA LETTER RRA..ORIYA LETTER RHA === == 0B5E ; UNASSIGNED # <reserved> == === 0B5F..0B63 ; PVALID # ORIYA LETTER YYA..ORIYA VOWEL SIGN VOCALIC L === === 0B64..0B65 ; UNASSIGNED # <reserved>..<reserved> === === 0B66..0B6F ; PVALID # ORIYA DIGIT ZERO..ORIYA DIGIT NINE === == 0B70 ; DISALLOWED # ORIYA ISSHAR == == 0B71 ; PVALID # ORIYA LETTER WA == === 0B72..0B81 ; UNASSIGNED # <reserved>..<reserved> === === 0B82..0B83 ; PVALID # TAMIL SIGN ANUSVARA..TAMIL SIGN VISARGA === == 0B84 ; UNASSIGNED # <reserved> == === 0B85..0B8A ; PVALID # TAMIL LETTER A..TAMIL LETTER UU === === 0B8B..0B8D ; UNASSIGNED # <reserved>..<reserved> === === 0B8E..0B90 ; PVALID # TAMIL LETTER E..TAMIL LETTER AI === == 0B91 ; UNASSIGNED # <reserved> == === 0B92..0B95 ; PVALID # TAMIL LETTER O..TAMIL LETTER KA === === 0B96..0B98 ; UNASSIGNED # <reserved>..<reserved> === === 0B99..0B9A ; PVALID # TAMIL LETTER NGA..TAMIL LETTER CA === == 0B9B ; UNASSIGNED # <reserved> == == 0B9C ; PVALID # TAMIL LETTER JA == == 0B9D ; UNASSIGNED # <reserved> == === 0B9E..0B9F ; PVALID # TAMIL LETTER NYA..TAMIL LETTER TTA === === 0BA0..0BA2 ; UNASSIGNED # <reserved>..<reserved> === === 0BA3..0BA4 ; PVALID # TAMIL LETTER NNA..TAMIL LETTER TA === === 0BA5..0BA7 ; UNASSIGNED # <reserved>..<reserved> === === 0BA8..0BAA ; PVALID # TAMIL LETTER NA..TAMIL LETTER PA === === 0BAB..0BAD ; UNASSIGNED # <reserved>..<reserved> === === 0BAE..0BB9 ; PVALID # TAMIL LETTER MA..TAMIL LETTER HA === === 0BBA..0BBD ; UNASSIGNED # <reserved>..<reserved> === === 0BBE..0BC2 ; PVALID # TAMIL VOWEL SIGN AA..TAMIL VOWEL SIGN UU === === 0BC3..0BC5 ; UNASSIGNED # <reserved>..<reserved> === === 0BC6..0BC8 ; PVALID # TAMIL VOWEL SIGN E..TAMIL VOWEL SIGN AI === == 0BC9 ; UNASSIGNED # <reserved> == === 0BCA..0BCD ; PVALID # TAMIL VOWEL SIGN O..TAMIL SIGN VIRAMA === === 0BCE..0BCF ; UNASSIGNED # <reserved>..<reserved> === == 0BD0 ; PVALID # TAMIL OM == === 0BD1..0BD6 ; UNASSIGNED # <reserved>..<reserved> === == 0BD7 ; PVALID # TAMIL AU LENGTH MARK == === 0BD8..0BE5 ; UNASSIGNED # <reserved>..<reserved> === === 0BE6..0BEF ; PVALID # TAMIL DIGIT ZERO..TAMIL DIGIT NINE === === 0BF0..0BFA ; DISALLOWED # TAMIL NUMBER TEN..TAMIL NUMBER SIGN === === 0BFB..0C00 ; UNASSIGNED # <reserved>..<reserved> === === 0C01..0C03 ; PVALID # TELUGU SIGN CANDRABINDU..TELUGU SIGN VISARGA === == 0C04 ; UNASSIGNED # <reserved> == === 0C05..0C0C ; PVALID # TELUGU LETTER A..TELUGU LETTER VOCALIC L === == 0C0D ; UNASSIGNED # <reserved> == === 0C0E..0C10 ; PVALID # TELUGU LETTER E..TELUGU LETTER AI === == 0C11 ; UNASSIGNED # <reserved> == === 0C12..0C28 ; PVALID # TELUGU LETTER O..TELUGU LETTER NA === == 0C29 ; UNASSIGNED # <reserved> == === 0C2A..0C33 ; PVALID # TELUGU LETTER PA..TELUGU LETTER LLA === == 0C34 ; UNASSIGNED # <reserved> == === 0C35..0C39 ; PVALID # TELUGU LETTER VA..TELUGU LETTER HA === === 0C3A..0C3C ; UNASSIGNED # <reserved>..<reserved> === === 0C3D..0C44 ; PVALID # TELUGU SIGN AVAGRAHA..TELUGU VOWEL SIGN VOCA === == 0C45 ; UNASSIGNED # <reserved> == === 0C46..0C48 ; PVALID # TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI === == 0C49 ; UNASSIGNED # <reserved> == === 0C4A..0C4D ; PVALID # TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA === === 0C4E..0C54 ; UNASSIGNED # <reserved>..<reserved> === === 0C55..0C56 ; PVALID # TELUGU LENGTH MARK..TELUGU AI LENGTH MARK === == 0C57 ; UNASSIGNED # <reserved> == === 0C58..0C59 ; PVALID # TELUGU LETTER TSA..TELUGU LETTER DZA === === 0C5A..0C5F ; UNASSIGNED # <reserved>..<reserved> === === 0C60..0C63 ; PVALID # TELUGU LETTER VOCALIC RR..TELUGU VOWEL SIGN === === 0C64..0C65 ; UNASSIGNED # <reserved>..<reserved> === === 0C66..0C6F ; PVALID # TELUGU DIGIT ZERO..TELUGU DIGIT NINE === === 0C70..0C77 ; UNASSIGNED # <reserved>..<reserved> === === 0C78..0C7F ; DISALLOWED # TELUGU FRACTION DIGIT ZERO FOR ODD POWERS OF === === 0C80..0C81 ; UNASSIGNED # <reserved>..<reserved> === === 0C82..0C83 ; PVALID # KANNADA SIGN ANUSVARA..KANNADA SIGN VISARGA === == 0C84 ; UNASSIGNED # <reserved> == === 0C85..0C8C ; PVALID # KANNADA LETTER A..KANNADA LETTER VOCALIC L === == 0C8D ; UNASSIGNED # <reserved> == === 0C8E..0C90 ; PVALID # KANNADA LETTER E..KANNADA LETTER AI === == 0C91 ; UNASSIGNED # <reserved> == === 0C92..0CA8 ; PVALID # KANNADA LETTER O..KANNADA LETTER NA === == 0CA9 ; UNASSIGNED # <reserved> == === 0CAA..0CB3 ; PVALID # KANNADA LETTER PA..KANNADA LETTER LLA === == 0CB4 ; UNASSIGNED # <reserved> == === 0CB5..0CB9 ; PVALID # KANNADA LETTER VA..KANNADA LETTER HA === === 0CBA..0CBB ; UNASSIGNED # <reserved>..<reserved> === === 0CBC..0CC4 ; PVALID # KANNADA SIGN NUKTA..KANNADA VOWEL SIGN VOCAL === == 0CC5 ; UNASSIGNED # <reserved> == === 0CC6..0CC8 ; PVALID # KANNADA VOWEL SIGN E..KANNADA VOWEL SIGN AI === == 0CC9 ; UNASSIGNED # <reserved> == === 0CCA..0CCD ; PVALID # KANNADA VOWEL SIGN O..KANNADA SIGN VIRAMA === === 0CCE..0CD4 ; UNASSIGNED # <reserved>..<reserved> === === 0CD5..0CD6 ; PVALID # KANNADA LENGTH MARK..KANNADA AI LENGTH MARK === === 0CD7..0CDD ; UNASSIGNED # <reserved>..<reserved> === == 0CDE ; PVALID # KANNADA LETTER FA == == 0CDF ; UNASSIGNED # <reserved> == === 0CE0..0CE3 ; PVALID # KANNADA LETTER VOCALIC RR..KANNADA VOWEL SIG === === 0CE4..0CE5 ; UNASSIGNED # <reserved>..<reserved> === === 0CE6..0CEF ; PVALID # KANNADA DIGIT ZERO..KANNADA DIGIT NINE === == 0CF0 ; UNASSIGNED # <reserved> == === 0CF1..0CF2 ; DISALLOWED # KANNADA SIGN JIHVAMULIYA..KANNADA SIGN UPADH === === 0CF3..0D01 ; UNASSIGNED # <reserved>..<reserved> === === 0D02..0D03 ; PVALID # MALAYALAM SIGN ANUSVARA..MALAYALAM SIGN VISA === == 0D04 ; UNASSIGNED # <reserved> == === 0D05..0D0C ; PVALID # MALAYALAM LETTER A..MALAYALAM LETTER VOCALIC === == 0D0D ; UNASSIGNED # <reserved> == === 0D0E..0D10 ; PVALID # MALAYALAM LETTER E..MALAYALAM LETTER AI === == 0D11 ; UNASSIGNED # <reserved> == === 0D12..0D28 ; PVALID # MALAYALAM LETTER O..MALAYALAM LETTER NA === == 0D29 ; UNASSIGNED # <reserved> == === 0D2A..0D39 ; PVALID # MALAYALAM LETTER PA..MALAYALAM LETTER HA === === 0D3A..0D3C ; UNASSIGNED # <reserved>..<reserved> === === 0D3D..0D44 ; PVALID # MALAYALAM SIGN AVAGRAHA..MALAYALAM VOWEL SIG === == 0D45 ; UNASSIGNED # <reserved> == === 0D46..0D48 ; PVALID # MALAYALAM VOWEL SIGN E..MALAYALAM VOWEL SIGN === == 0D49 ; UNASSIGNED # <reserved> == === 0D4A..0D4D ; PVALID # MALAYALAM VOWEL SIGN O..MALAYALAM SIGN VIRAM === === 0D4E..0D56 ; UNASSIGNED # <reserved>..<reserved> === == 0D57 ; PVALID # MALAYALAM AU LENGTH MARK == === 0D58..0D5F ; UNASSIGNED # <reserved>..<reserved> === === 0D60..0D63 ; PVALID # MALAYALAM LETTER VOCALIC RR..MALAYALAM VOWEL === === 0D64..0D65 ; UNASSIGNED # <reserved>..<reserved> === === 0D66..0D6F ; PVALID # MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE === === 0D70..0D75 ; DISALLOWED # MALAYALAM NUMBER TEN..MALAYALAM FRACTION THR === === 0D76..0D78 ; UNASSIGNED # <reserved>..<reserved> === == 0D79 ; DISALLOWED # MALAYALAM DATE MARK == === 0D7A..0D7F ; PVALID # MALAYALAM LETTER CHILLU NN..MALAYALAM LETTER === === 0D80..0D81 ; UNASSIGNED # <reserved>..<reserved> === === 0D82..0D83 ; PVALID # SINHALA SIGN ANUSVARAYA..SINHALA SIGN VISARG === == 0D84 ; UNASSIGNED # <reserved> == === 0D85..0D96 ; PVALID # SINHALA LETTER AYANNA..SINHALA LETTER AUYANN === === 0D97..0D99 ; UNASSIGNED # <reserved>..<reserved> === === 0D9A..0DB1 ; PVALID # SINHALA LETTER ALPAPRAANA KAYANNA..SINHALA L === == 0DB2 ; UNASSIGNED # <reserved> == === 0DB3..0DBB ; PVALID # SINHALA LETTER SANYAKA DAYANNA..SINHALA LETT === == 0DBC ; UNASSIGNED # <reserved> == == 0DBD ; PVALID # SINHALA LETTER DANTAJA LAYANNA == === 0DBE..0DBF ; UNASSIGNED # <reserved>..<reserved> === === 0DC0..0DC6 ; PVALID # SINHALA LETTER VAYANNA..SINHALA LETTER FAYAN === === 0DC7..0DC9 ; UNASSIGNED # <reserved>..<reserved> === == 0DCA ; PVALID # SINHALA SIGN AL-LAKUNA == === 0DCB..0DCE ; UNASSIGNED # <reserved>..<reserved> === === 0DCF..0DD4 ; PVALID # SINHALA VOWEL SIGN AELA-PILLA..SINHALA VOWEL === == 0DD5 ; UNASSIGNED # <reserved> == == 0DD6 ; PVALID # SINHALA VOWEL SIGN DIGA PAA-PILLA == == 0DD7 ; UNASSIGNED # <reserved> == === 0DD8..0DDF ; PVALID # SINHALA VOWEL SIGN GAETTA-PILLA..SINHALA VOW === === 0DE0..0DF1 ; UNASSIGNED # <reserved>..<reserved> === === 0DF2..0DF3 ; PVALID # SINHALA VOWEL SIGN DIGA GAETTA-PILLA..SINHAL === == 0DF4 ; DISALLOWED # SINHALA PUNCTUATION KUNDDALIYA == === 0DF5..0E00 ; UNASSIGNED # <reserved>..<reserved> === === 0E01..0E32 ; PVALID # THAI CHARACTER KO KAI..THAI CHARACTER SARA A === == 0E33 ; DISALLOWED # THAI CHARACTER SARA AM == === 0E34..0E3A ; PVALID # THAI CHARACTER SARA I..THAI CHARACTER PHINTH === === 0E3B..0E3E ; UNASSIGNED # <reserved>..<reserved> === == 0E3F ; DISALLOWED # THAI CURRENCY SYMBOL BAHT == === 0E40..0E4E ; PVALID # THAI CHARACTER SARA E..THAI CHARACTER YAMAKK === == 0E4F ; DISALLOWED # THAI CHARACTER FONGMAN == === 0E50..0E59 ; PVALID # THAI DIGIT ZERO..THAI DIGIT NINE === === 0E5A..0E5B ; DISALLOWED # THAI CHARACTER ANGKHANKHU..THAI CHARACTER KH === === 0E5C..0E80 ; UNASSIGNED # <reserved>..<reserved> === === 0E81..0E82 ; PVALID # LAO LETTER KO..LAO LETTER KHO SUNG === == 0E83 ; UNASSIGNED # <reserved> == == 0E84 ; PVALID # LAO LETTER KHO TAM == === 0E85..0E86 ; UNASSIGNED # <reserved>..<reserved> === === 0E87..0E88 ; PVALID # LAO LETTER NGO..LAO LETTER CO === == 0E89 ; UNASSIGNED # <reserved> == == 0E8A ; PVALID # LAO LETTER SO TAM == === 0E8B..0E8C ; UNASSIGNED # <reserved>..<reserved> === == 0E8D ; PVALID # LAO LETTER NYO == === 0E8E..0E93 ; UNASSIGNED # <reserved>..<reserved> === === 0E94..0E97 ; PVALID # LAO LETTER DO..LAO LETTER THO TAM === == 0E98 ; UNASSIGNED # <reserved> == === 0E99..0E9F ; PVALID # LAO LETTER NO..LAO LETTER FO SUNG === == 0EA0 ; UNASSIGNED # <reserved> == === 0EA1..0EA3 ; PVALID # LAO LETTER MO..LAO LETTER LO LING === == 0EA4 ; UNASSIGNED # <reserved> == == 0EA5 ; PVALID # LAO LETTER LO LOOT == == 0EA6 ; UNASSIGNED # <reserved> == == 0EA7 ; PVALID # LAO LETTER WO == === 0EA8..0EA9 ; UNASSIGNED # <reserved>..<reserved> === === 0EAA..0EAB ; PVALID # LAO LETTER SO SUNG..LAO LETTER HO SUNG === == 0EAC ; UNASSIGNED # <reserved> == === 0EAD..0EB2 ; PVALID # LAO LETTER O..LAO VOWEL SIGN AA === == 0EB3 ; DISALLOWED # LAO VOWEL SIGN AM == === 0EB4..0EB9 ; PVALID # LAO VOWEL SIGN I..LAO VOWEL SIGN UU === == 0EBA ; UNASSIGNED # <reserved> == === 0EBB..0EBD ; PVALID # LAO VOWEL SIGN MAI KON..LAO SEMIVOWEL SIGN N === === 0EBE..0EBF ; UNASSIGNED # <reserved>..<reserved> === === 0EC0..0EC4 ; PVALID # LAO VOWEL SIGN E..LAO VOWEL SIGN AI === == 0EC5 ; UNASSIGNED # <reserved> == == 0EC6 ; PVALID # LAO KO LA == == 0EC7 ; UNASSIGNED # <reserved> == === 0EC8..0ECD ; PVALID # LAO TONE MAI EK..LAO NIGGAHITA === === 0ECE..0ECF ; UNASSIGNED # <reserved>..<reserved> === === 0ED0..0ED9 ; PVALID # LAO DIGIT ZERO..LAO DIGIT NINE === === 0EDA..0EDB ; UNASSIGNED # <reserved>..<reserved> === === 0EDC..0EDD ; DISALLOWED # LAO HO NO..LAO HO MO === === 0EDE..0EFF ; UNASSIGNED # <reserved>..<reserved> === == 0F00 ; PVALID # TIBETAN SYLLABLE OM == === 0F01..0F0A ; DISALLOWED # TIBETAN MARK GTER YIG MGO TRUNCATED A..TIBET === == 0F0B ; PVALID # TIBETAN MARK INTERSYLLABIC TSHEG == === 0F0C..0F17 ; DISALLOWED # TIBETAN MARK DELIMITER TSHEG BSTAR..TIBETAN === === 0F18..0F19 ; PVALID # TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN === === 0F1A..0F1F ; DISALLOWED # TIBETAN SIGN RDEL DKAR GCIG..TIBETAN SIGN RD === === 0F20..0F29 ; PVALID # TIBETAN DIGIT ZERO..TIBETAN DIGIT NINE === === 0F2A..0F34 ; DISALLOWED # TIBETAN DIGIT HALF ONE..TIBETAN MARK BSDUS R === == 0F35 ; PVALID # TIBETAN MARK NGAS BZUNG NYI ZLA == == 0F36 ; DISALLOWED # TIBETAN MARK CARET -DZUD RTAGS BZHI MIG CAN == == 0F37 ; PVALID # TIBETAN MARK NGAS BZUNG SGOR RTAGS == == 0F38 ; DISALLOWED # TIBETAN MARK CHE MGO == == 0F39 ; PVALID # TIBETAN MARK TSA -PHRU == === 0F3A..0F3D ; DISALLOWED # TIBETAN MARK GUG RTAGS GYON..TIBETAN MARK AN === === 0F3E..0F42 ; PVALID # TIBETAN SIGN YAR TSHES..TIBETAN LETTER GA === == 0F43 ; DISALLOWED # TIBETAN LETTER GHA == === 0F44..0F47 ; PVALID # TIBETAN LETTER NGA..TIBETAN LETTER JA === == 0F48 ; UNASSIGNED # <reserved> == === 0F49..0F4C ; PVALID # TIBETAN LETTER NYA..TIBETAN LETTER DDA === == 0F4D ; DISALLOWED # TIBETAN LETTER DDHA == === 0F4E..0F51 ; PVALID # TIBETAN LETTER NNA..TIBETAN LETTER DA === == 0F52 ; DISALLOWED # TIBETAN LETTER DHA == === 0F53..0F56 ; PVALID # TIBETAN LETTER NA..TIBETAN LETTER BA === == 0F57 ; DISALLOWED # TIBETAN LETTER BHA == === 0F58..0F5B ; PVALID # TIBETAN LETTER MA..TIBETAN LETTER DZA === == 0F5C ; DISALLOWED # TIBETAN LETTER DZHA == === 0F5D..0F68 ; PVALID # TIBETAN LETTER WA..TIBETAN LETTER A === == 0F69 ; DISALLOWED # TIBETAN LETTER KSSA == === 0F6A..0F6C ; PVALID # TIBETAN LETTER FIXED-FORM RA..TIBETAN LETTER === === 0F6D..0F70 ; UNASSIGNED # <reserved>..<reserved> === === 0F71..0F72 ; PVALID # TIBETAN VOWEL SIGN AA..TIBETAN VOWEL SIGN I === == 0F73 ; DISALLOWED # TIBETAN VOWEL SIGN II == == 0F74 ; PVALID # TIBETAN VOWEL SIGN U == === 0F75..0F79 ; DISALLOWED # TIBETAN VOWEL SIGN UU..TIBETAN VOWEL SIGN VO === === 0F7A..0F80 ; PVALID # TIBETAN VOWEL SIGN E..TIBETAN VOWEL SIGN REV === == 0F81 ; DISALLOWED # TIBETAN VOWEL SIGN REVERSED II == === 0F82..0F84 ; PVALID # TIBETAN SIGN NYI ZLA NAA DA..TIBETAN MARK HA === == 0F85 ; DISALLOWED # TIBETAN MARK PALUTA == === 0F86..0F8B ; PVALID # TIBETAN SIGN LCI RTAGS..TIBETAN SIGN GRU MED === === 0F8C..0F8F ; UNASSIGNED # <reserved>..<reserved> === === 0F90..0F92 ; PVALID # TIBETAN SUBJOINED LETTER KA..TIBETAN SUBJOIN === == 0F93 ; DISALLOWED # TIBETAN SUBJOINED LETTER GHA == === 0F94..0F97 ; PVALID # TIBETAN SUBJOINED LETTER NGA..TIBETAN SUBJOI === == 0F98 ; UNASSIGNED # <reserved> == === 0F99..0F9C ; PVALID # TIBETAN SUBJOINED LETTER NYA..TIBETAN SUBJOI === == 0F9D ; DISALLOWED # TIBETAN SUBJOINED LETTER DDHA == === 0F9E..0FA1 ; PVALID # TIBETAN SUBJOINED LETTER NNA..TIBETAN SUBJOI === == 0FA2 ; DISALLOWED # TIBETAN SUBJOINED LETTER DHA == === 0FA3..0FA6 ; PVALID # TIBETAN SUBJOINED LETTER NA..TIBETAN SUBJOIN === == 0FA7 ; DISALLOWED # TIBETAN SUBJOINED LETTER BHA == === 0FA8..0FAB ; PVALID # TIBETAN SUBJOINED LETTER MA..TIBETAN SUBJOIN === == 0FAC ; DISALLOWED # TIBETAN SUBJOINED LETTER DZHA == === 0FAD..0FB8 ; PVALID # TIBETAN SUBJOINED LETTER WA..TIBETAN SUBJOIN === == 0FB9 ; DISALLOWED # TIBETAN SUBJOINED LETTER KSSA == === 0FBA..0FBC ; PVALID # TIBETAN SUBJOINED LETTER FIXED-FORM WA..TIBE === == 0FBD ; UNASSIGNED # <reserved> == === 0FBE..0FC5 ; DISALLOWED # TIBETAN KU RU KHA..TIBETAN SYMBOL RDO RJE === == 0FC6 ; PVALID # TIBETAN SYMBOL PADMA GDAN == === 0FC7..0FCC ; DISALLOWED # TIBETAN SYMBOL RDO RJE RGYA GRAM..TIBETAN SY === == 0FCD ; UNASSIGNED # <reserved> == === 0FCE..0FD4 ; DISALLOWED # TIBETAN SIGN RDEL NAG RDEL DKAR..TIBETAN MAR === === 0FD5..0FFF ; UNASSIGNED # <reserved>..<reserved> === === 1000..1049 ; PVALID # MYANMAR LETTER KA..MYANMAR DIGIT NINE === === 104A..104F ; DISALLOWED # MYANMAR SIGN LITTLE SECTION..MYANMAR SYMBOL === === 1050..1099 ; PVALID # MYANMAR LETTER SHA..MYANMAR SHAN DIGIT NINE === === 109A..109D ; UNASSIGNED # <reserved>..<reserved> === === 109E..10C5 ; DISALLOWED # MYANMAR SYMBOL SHAN ONE..GEORGIAN CAPITAL LE === === 10C6..10CF ; UNASSIGNED # <reserved>..<reserved> === === 10D0..10FA ; PVALID # GEORGIAN LETTER AN..GEORGIAN LETTER AIN === === 10FB..10FC ; DISALLOWED # GEORGIAN PARAGRAPH SEPARATOR..MODIFIER LETTE === === 10FD..10FF ; UNASSIGNED # <reserved>..<reserved> === === 1100..1159 ; DISALLOWED # HANGUL CHOSEONG KIYEOK..HANGUL CHOSEONG YEOR === === 115A..115E ; UNASSIGNED # <reserved>..<reserved> === === 115F..11A2 ; DISALLOWED # HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG SSA === === 11A3..11A7 ; UNASSIGNED # <reserved>..<reserved> === === 11A8..11F9 ; DISALLOWED # HANGUL JONGSEONG KIYEOK..HANGUL JONGSEONG YE === === 11FA..11FF ; UNASSIGNED # <reserved>..<reserved> === === 1200..1248 ; PVALID # ETHIOPIC SYLLABLE HA..ETHIOPIC SYLLABLE QWA === == 1249 ; UNASSIGNED # <reserved> == === 124A..124D ; PVALID # ETHIOPIC SYLLABLE QWI..ETHIOPIC SYLLABLE QWE === === 124E..124F ; UNASSIGNED # <reserved>..<reserved> === === 1250..1256 ; PVALID # ETHIOPIC SYLLABLE QHA..ETHIOPIC SYLLABLE QHO === == 1257 ; UNASSIGNED # <reserved> == == 1258 ; PVALID # ETHIOPIC SYLLABLE QHWA == == 1259 ; UNASSIGNED # <reserved> ==
[edit] 8. References
[edit] 8.1. Normative References
[RFC4690] Klensin, J., Faltstrom, P., Karp, C., and IAB, "Review and Recommendations for Internationalized Domain Names (IDNs)", RFC 4690, September 2006.
[TR15] Davis, M. and M. Duerst, "Unicode Standard Annex #15, Unicode Normalization Forms, an integral part of the Unicode Standard", <http://unicode.org/unicode/reports/tr15/>.
[Unicode5] The Unicode Consortium, "The Unicode Standard, Version 5.0.0", Boston, MA, Addison-Wesley ISBN 0-321-48091-0, 2007.
[Unicode51]
The Unicode Consortium, "The Unicode Standard, Version
5.1.0", Unicode 5.0.0, Boston, MA, Addison-Wesley ISBN 0-321-48091-0, as amended by Unicode 5.1.0
http://www.unicode.org/versions/Unicode5.1.0/, 2008,
<http://www.unicode.org/versions/Unicode5.1.0/>.
[edit] 8.2. Informative References
[IDNA2008-definitions] Klensin, J., Ed., "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework", June 2009, <http://www.ietf.org/internet-drafts/ draft-ietf-idnabis-defs-10.txt>.
[IDNA2008-protocol] Klensin, J., "Internationalizing Domain Names in Applications (IDNA): Protocol", July 2009, <http:// www.ietf.org/internet-drafts/ draft-ietf-idnabis-protocol-14.txt>.
[IDNA2008-rationale] Klensin, J., Ed., "Internationalized Domain Names for Applications (IDNA): Background, Explanation, and Rationale", June 2009, <http://www.ietf.org/ internet-drafts/draft-ietf-idnabis-rationale-10.txt>.
[RFC3454] Hoffman, P. and M. Blanchet, "Preparation of Internationalized Strings ("stringprep")", RFC 3454, December 2002.
[RFC3491] Hoffman, P. and M. Blanchet, "Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)", RFC 3491, March 2003.
[RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.
[1] <http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
