IUCG remarks

From Wikidna.org

Jump to: navigation, search

(under final group review)

This memo covers the collective remarks from the IUCG/france@large CX-DNS participants. The participants are of three origins:

  • Two newcomers who are motivated by the Last Call opportunity to ensure that they understand everything.
  • Three participants who have been involved, at some stage and form, in the IDNABIS debate, who in turn would like to see the document published finally.
  • Five participants who are primarily interested in making sure that they can document "IDNAPLUS" as the strictly conformant support of IDNA by the Interplus architecture that the IUCG is working on.

.

Contents

[edit]
General appreciation

  • The document repartition seems adequate. However, even if the Mapping memo was not a part of the IDNA (why?) document set, it is more than logical and enlightening to have it read prior to the Protocol parts.
  • The documents are rather confusing because it is impossible to decide whether:
    • they consider IDNA as a part or not as a part of the DNS (we may also be influenced by the ML-DNS pile we work on).
    • they differentiate (which) between characters and codepoints.
    • they use NFKC or NFC, and what are their differences, intrinsically and from an IDNA point of view
    • they want to be a complete standards, or a partial suggestions, set. This results from:
      • the non-normative forms are being used in places that one would deem normative
      • the constant discussion of Registries' capacities/obligations and the lack of documentation on the tools for executing them and managing the related registration/coding metadata and rules.

[edit]
IDNA Definitions

  • Information on Unicode is scattered throughout the document. Wouldn’t it be much better to describe a clear sequence?
  • what an IDNA is,
  • what IDNs are,
  • what IDNA labels are,
  • what they are made of,
  • how Unicode supports them, including NFC in the same 2.1. section,
  • how a zone manager may impose profiling rules (description, enforcement).
  • Most of the new terms are discussed before being defined. This starts with the confusing "looking them up" in part 1.1.1. (which means resolving, and not just asking about, validity or existence) as opposed to "registering"). IDNs are introduced 2.3.2.3. etc. This certainly reflects how difficult the work is in defining all these terms, but it is still quite confusing. For example, it is advisable to begin with part 4.4.
  • The different classes of domain names that are discussed only seem to be related to IDNA without an exhaustive presentation of the DNS domain name context. The names are somewhat confusing. The drafts are certainly clear, but they do not reflect a progressive logic of discovery of the nature of a name/label that could be ported to programming functions.
  • References to the lower/uppercase image can be understood by DNS old-timers, but is confusing to newcomers, as it does not reflect the same functionality and because U-label/A-label lower/uppercase treatment is not the same.
  • Different keyboards and encoding are discussed, stressing that a DNS resolution calls for a U-label conversion, but nothing obliges local applications to transcode user entries to Unicode when they interoperate at a layer other than DNS. However, these applications may want to canonize these entries in their proper way. Interplus supports the idea that an application layer may use middle non-Unicode and non-ASCII coding. Among others, this facilitates interoperability with UTF-8 that Microsoft supports within private nets: the user interface may be common and the underlying machinery either IDNA or UTF-8.
  • 4.1. "Security on the Internet partly relies on the DNS. Thus, any change to the characteristics of the DNS can change the security of much of the Internet." This sentence seems extremely confusing, as IDNA does not affect (change characteristics) the DNS but is rather built on the fact that they will not be changed.
  • The same : "The security of the Internet is compromised if a user entering a single internationalized name is connected to different servers based on different interpretations of the internationalized domain name." The security of the Internet is not compromised, however, trust in the IDNA proposition might be.
  • The 4.7. Summary might be considered adventurous? Corporations such as Nominum propose services that are supposed to protect the DNS. One of the purposes of ML-DNS is precisely to permit an architectural protection.

[edit]
IDNA Rationale

  • 1.3.1. DNS "Name" Terminology. "" would be better read as "orthotypographic" as an orthographic error that can be a way to lose some special semantics differences due to orthotypographic conventions.
  • 1.3.2. "IDNA-landr" typo?
  • 1.4. "Reduce the dependency on mapping, in order that the pre-mapped forms (which are not valid IDNA labels) tend to appear less often in various contexts, in favor of valid A-labels." calls for the Charter to be revised. ALternatively, it could say , remove dependence on mapping as per a mapping document, in which this document would include a section on the various ways to ensure DNS security and the barring of some U+codes in some presentations.
  • 1.5. "This model has served the existing applications well, but it requires, with or without internationalized domain names, that users know the exact spelling of the domain names that are to be typed into applications such as web browsers and mail user agents. The introduction of the larger repertoire of characters potentially makes the set of misspellings larger, especially given that in some cases the same appearance, for example on a business card, might visually match several Unicode code points or several sequences of code points." may be read as if the users of these languages were more prone to errors than ASCII language.
  • "If an application wants to use non-ASCII characters in public DNS domain names, IDNA is the only currently-defined option." IDNA is not a DNS option. It is an application way to transcode Unicode domain names in LDH domain names for the convenience of ASCII oriented international managers. The idea is to attain the adherence of local users and managers to IDNA and not to impose ASCII on them. DNS is UTF-8 compatible.
  • "IDNA2008 divides all possible Unicode code-points into four categories: PROTOCOL-VALID, CONTEXTUAL RULE REQUIRED, DISALLOWED and UNASSIGNED.
    3.1.1. PROTOCOL-VALID
    Characters identified as "PROTOCOL-VALID" (often abbreviated "PVALID") are permitted in IDNs." Are we talking of code-points or of characters?
  • 3.1.2.1 Not in the TOC
  • 3.1.3 Disallowed "various HEART symbols" - is U+38FA also disallowed? or U+3966?
  • 3.1.3. This is the first time anyone has spoken of NFKC. In IDNA Defs and other cases, it is NFC. Shouldn’t t both of them be documented? Shouldn’t someone explain in which specific case one is used?
  • "The character is an upper-case form or some other form that is mapped to another character by Unicode casefolding." this seems to create a very large mapping scheme that depends on a non-documented Unicode system needing correction (at least when it does not specifically support majuscules). Moreover, are we dealing with characters (that are orthogonal to Unicode) or with codepoints that represent characters and that are subject to Unicode casefolding. The proposition is to: (1) clarify the character/codepoint issue, (2) explain what Unicode case folding is and its limitations, (3) move them to CONTEXTO when these codepoints are both used as upper-cases and as majuscules, (4) explain that majuscules that are supported by upper-cases will be transcoded by punycode.
  • 4.4. Case mapping. One may regret that the French majuscules current support of Unicode, which isperfectly adequate in other circumstances yet inadequate in this case, is not discussed. This would explain the upgrade above.
  • 4.5. "Examples of this are Yiddish, written with an extended Hebrew script, and Dhivehi (the official language of Maldives) which is written in the Thaana script (which is, in turn, derived from the Arabic script)" It seems that some explanation about Yiddish would be welcome so that the language will obtain the same support as Dhivehi and Thaana.
  • 5. "Conversely, lookup applications are expected to reject labels that clearly violate global (protocol) rules (no one has ever seriously claimed that being liberal in what is accepted requires being stupid)." The remark between the parentheses is confusing: it possibly qualifies as "stupid" a behavior that is not recommended, but that is acceptable by the document set.
  • "Application implementors should be aware that where DNS wildcards are used, the ability to successfully resolve a name does not guarantee that it was actually registered." In which terms is this specific to IDNA?
  • 7.6. The Symbol Question. That part actually discusses the Unicode originated difficulties. Yet, the choice of Unicode has not yet been discussed.
  • 9. "Adding languages (or similar context) to IDNs generally, or to DNS matching in particular, would imply context dependent matching in DNS, which would be a very significant change to the DNS protocol itself". This sentence seems confusing. Natural languages are quoted throughout this IDNs document.

[edit]
IDNA Mapping

  • Not sure that the terminology of "make sense" is adequate or clear.
  • 1. Introduction - This document is supposed to be separated from the IDNA document set. It should then document what the IDNA protocol is. It seems that the IDNA2008 protocols boil down to "DNS domain names are to be expressed in LDH form. IDNA is a commonly agreed upon convention wherein if they are entered by the user in another form, applications are advised to convert them to UTF in order to filter and map them, as is discussed in the present document, as well as to transcode them in by using the punycode algorithm. Depending on the Registry policy, their registration can be carried out in the ITF and/or the transcoded ASCII form."
  • 2.3. NFC is confirmed, NFKC is not discussed.

[edit]
IDNA Protocol

  • As a general comment:
    • The SHOULD/MUST chains may be somewhat awkward. MUSTs are used in a protocol procedure and then an alternative to that procedure is pragmatically considered. It could be of interest to draft a MUST tree to consider which cases are, or are not, covered.
    • there is some confusion as to what the "string" is compared to the label and domain name, in which "Label" may be used instead of "U-Label" or sometimes "A-Label". Wouldn’t it be better to review the text in qualifying the "labels" in order to be certain that all the cases are clearly covered?
  • 3.2. "It does not apply to domain name slots which do not use the Letter/Digit/Hyphen (LDH) syntax rules." Confusing. Would some of the DN slots not accept both?
  • 3.2.1. The word CLASS only appears in the whole document set in two sentences: "DNA applies only to domain names in the NAME and RDATA fields of DNS resource records whose CLASS is IN. See RFC 1034 [RFC1034] for precise definitions of these terms. The application of IDNA to DNS resource records depends entirely on the CLASS of the record, and not on the TYPE except as noted below."
    What about internationalized domain name in a non IN CLASS?
  • 4. "This section defines the procedure for registering an IDN. The procedure is implementation independent; any sequence of steps that produces exactly the same result for all labels is considered a valid implementation." A procedure does provide but does not define a result?
  • 4.1. The obligation chain reads: "By the time a string enters the IDNA registration process [], it is expected to be in Unicode []", yet "registries [] SHOULD avoid any possible ambiguity by accepting registrations only for A-labels []."
  • 4.3. Registry restriction inheritance is not alluded to.
  • 5. Does this repetition (already in INDA Rationale) "the presence of wild cards in the DNS might cause a string that is not actually registered in the DNS to be successfully looked up." reflect what the BIDI documents slightly differently: "Wildcards create the odd situation where a label is "valid" (can be looked up successfully) without the zone owner knowing that this label exists. So an owner of a zone whose name starts with a digit and contains a wildcard has no way of controlling whether or not names with RTL labels in them are looked up in his zone."
  • 5.2. The case of a character that is not supported by Unicode is not discussed.
  • 5.4. The use of "U-Labels" in this part instead of "Labels" would probably clarify it.
  • "applying the test is likely to give much better information about the reason for a lookup failure -- information that may be usefully passed to the user when that is feasible -- than DNS resolution failure information alone" might this lead to the idea that they could also be carried in case of the failure to better document it?
  • "For all other strings, the lookup application MUST rely on the presence or absence of labels in the DNS to determine the validity of those labels and the validity of the characters they contain". Is it correct to assume that the first labels stand for "A-Label" and the second one stands for "their corresponding U-Labels"?
  • 7. IANA Considerations - There is no commitment from UNICODE to not update those Unicode documents that are accepted as normative in the IDNA documentation set. Should their copy at the time of the publication of this set not be stored by the IANA?

[edit]
IDNA BIDI

  • 1.1. Advisable or not to specify "when U-labels" instead of "labels" ?
  • 1.4. BIDI properties come from Unicode. They might not be complete or could be completed in the future. What then?
  • 2. A replacement for the RFC 3454 BIDI rule: it would probably be good to indicate the applying order.
  • 7. Does that restriction mean that telephone numbers cannot be registered in BIDI zones?
  • 8. IANA considerations. Same remark as in the Protocol case. Moreover, the section above then states: "the determination of validity for any string depends on the Unicode BIDI property values, which are not declared immutable by the Unicode Consortium."

[edit]
IDNA Tables

  • 1. Introduction. "In particular, some combinations of allowed code points are not advisable for use in IDNs due to rules specific to a script or class of characters" introduces the concept of a "class of characters", but does not document it. IDNA Rationale 7.1.3 states "Maintain IDNA and Unicode tables that are consistent with regard to versions, i.e., unless the application actually executes the classification rules in [IDNA2008-Tables]" yet the only time "classifications (rules) appears" in IDNA Tables is in "4. Code points" as "The Categories and Rules defined in Section 2 and Section 3 apply to all Unicode code points. The table in Appendix B shows, for illustrative purposes, the consequences of the categories and classification rules, and the resulting property values."
    What is a "class of characters"?
  • 1. Introduction ends with " This document is part of a series that, together, constitute a proposal for updating the IDNA standards to resolve issues uncovered in recent years, cover a broader range of scripts, and provide for migration to newer versions of Unicode. See [IDNA2008-rationale] for a broader discussion. " Should this not be removed or edited?
  • 2.1. "For more information, see section 4.5 of The Unicode Standard [Unicode5]." Is it also the case in Unicode 5.1? Shouldn’t this document be stored by the IANA?
  • 2.2. NFKC or NFC?
  • 2.10. "It should be noted that Unicode distinguishes between 'unassigned code points' and 'unassigned characters'". Can the differences (nature and in relation to IDNA) between the characters and codepoints be explained here?
  • 5. IANA consideration. It is suggested that IANA should retain online copies of the version of external documents that are normatively referenced in the IETF documents.
  • "A table from which that registry can be initialized, and some further discussion, appears in Appendix A. " - Who is to decide and maintain the table and according to which rules/procedures?
  • Appendix A. as a comment, we do not understand, from the presented kind of logic, as to why:
    • Tamil digits cannot be made subject to a rule and added to CONTEXTO?
    • The same for French majuscules?
    • The same for any zone specific restriction?
It seems implied that the logic should be the same on the sending and receiving end. The receiving end is only for decoding what the sending end chose to encode in its own context. That context needs to be considered and supported. If my application is in Tamil or French, it knows it and can be demanded to proceed accordingly.
Personal tools