IDNA Utils #274

indolering · 2017-03-15T04:31:15Z

Ticket tracking discussion of restoring the URL.domainToASCII and URL.domainToUnicode functions or implementing something new.

Summary

Processing international domain name labels is tricky, slow, and requires large lookup tables. However, browsers already perform this task (typically using the ICU library) and could expose these functions to JavaScript.

The proposal to add this functionality was nixed because no major browser had implemented it. Node supports the call (<50 lines) and a WebKit developer chimed in saying it would be trivial to add.

One issue is which version of ToUnicode function should be exposed and whether there are other utility functions that might be needed, such as subdomain comparisons, distinguishing between domains/subdomains/TLD/public suffix, and IP address parsing.

The text was updated successfully, but these errors were encountered:

indolering · 2017-03-15T05:26:58Z

My two cents: I'm worried that attaching additional parsing to this feature will result in an implementation hazard.

I was thinking about how to implement domain parsing yesterday and it would be convenient if the URL object contained that information. However, I would assume that the frequent updates to the Public Suffix List would make it hard to maintain compatibility between browsers and versions.

The size and complexity of an IP parsing library is nowhere near that of Stringprep, Nameprep, and Punycode. But if it's easy to do, IP parsing is a reasonable request to make of the standard library for a web-centric programming language.

WRT to which version of ToUnicode ... make it configurable 🤷‍♀️?

annevk · 2017-03-15T08:24:15Z

@mikewest any new thoughts on all this?

I'm mainly asking the other questions since I wonder whether we should introduce a URLHost object rather than a couple of one-off utility methods.

(I don't think we want ToUnicode to be configurable. Each extra bit of API surface just leads to lots of bugs. Better to start out small.)

rmisev · 2017-03-15T10:13:04Z

Also a host parser can be added to URLHost utility collection; see: #218 (comment) and #218 (comment)

annevk · 2017-03-15T10:19:01Z

const host = new URLHost(rawInput)
host.toString() // probably ASCII, as per usual
host.unicode() // ToUnicode?
host.type // "ipv4", "ipv6", "domain"

Alternatively you could make ToUnicode an argument to toString() somehow, similar to https://tc39.github.io/ecma262/#sec-number.prototype.tostring. Not sure if that's a precedent to follow however.

indolering · 2017-03-15T19:13:33Z

I don't think we want ToUnicode to be configurable. Each extra bit of API surface just leads to lots of bugs. Better to start out small.

Well, which "version" of ToUnicode do we want, the standard ICU implementation or what the browser URL bar displays?

Alternatively you could make ToUnicode an argument to toString()

I just don't think that overloading host.toString() is appropriate because the Punycode/Nameprep transform is very specific to DNS.

indolering · 2017-03-15T22:25:11Z

Thinking this over, I think it should output the standard ToUnicode function, as that's easier to standardize across environments (i.e. Node.js).

mikewest · 2017-03-16T05:59:05Z

I do think something like this would be useful, and Node's implementation seems like a reasonable justification for paving the cowpath. If WebKit and Mozilla are also interested, I think Blink would follow suit.

That said, @sleevi had some concerns in #63 (comment). CCing him here.

annevk · 2017-03-16T08:04:01Z

@indolering note that there's no such thing as "standard" ToUnicode. I think we should be using https://url.spec.whatwg.org/#concept-domain-to-unicode which we already use in various places throughout the platform. I don't think we should expose variants, which I think was @sleevi's concern in that other thread. (Also note that our host parser is very specific to DNS already, since it already involves Punycode/Nameprep due to ToASCII which is applied on input.)

indolering · 2017-03-16T18:54:27Z

note that there's no such thing as "standard" ToUnicode.

I'll take your word for it! It's my preference for a single implementation to be shared across browsers and Node. AFAIK, this isn't the case when it comes to what's displayed in the URL bar. But IDNA makes me go cross-eyed, so I'll stop inserting myself.

Also export host parser (already in use by HTML). Fixes #274.

annevk · 2017-03-31T13:34:25Z

I created a PR for this since we've got interest now from WebKit and Chrome. I'm a little worried about all the incompatibilities we still have with IDNA, but those are also exposed in other ways already.

@achristensen07 I'd appreciate review of #288 from you since you said WebKit would be interested in something like this.

What should be done before landing:

Add examples. If anyone here is willing to contribute some, that'd be great!
Write web-platform-tests. Again, help appreciated. If anyone needs guidance, I'm happy to help.

sleevi · 2017-03-31T13:37:10Z

Yeah, I do want to echo the concerns, and I'll loop @mikewest onto some design docs he may not have been aware of when he expressed support :)

Also export host parser (already in use by HTML). Fixes #274.

indolering mentioned this issue Mar 15, 2017

Remove URL.domainToASCII and URL.domainToUnicode #63

Closed

annevk added the topic: api label Mar 28, 2017

annevk added a commit that referenced this issue Mar 31, 2017

Expose a URLHost class to JavaScript

29484f1

Also export host parser (already in use by HTML). Fixes #274.

annevk mentioned this issue Mar 31, 2017

Expose a URLHost class to JavaScript #288

Open

4 tasks

This was referenced Apr 21, 2017

Expose origins as ASCII or Unicode whatwg/html#2568

Closed

Expose origin as ASCII or Unicode #297

Closed

annevk added a commit that referenced this issue Jun 7, 2018

Expose a URLHost class to JavaScript

533d5c8

Also export host parser (already in use by HTML). Fixes #274.

annevk mentioned this issue Jul 21, 2018

What is the correct value that should be passed for processing_option? #400

Closed

annevk added needs implementer interest Moving the issue forward requires implementers to express interest topic: idna addition/proposal New features or enhancements labels Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDNA Utils #274

IDNA Utils #274

indolering commented Mar 15, 2017 •

edited

Loading

indolering commented Mar 15, 2017

annevk commented Mar 15, 2017

rmisev commented Mar 15, 2017

annevk commented Mar 15, 2017

indolering commented Mar 15, 2017

indolering commented Mar 15, 2017

mikewest commented Mar 16, 2017

annevk commented Mar 16, 2017

indolering commented Mar 16, 2017

annevk commented Mar 31, 2017

sleevi commented Mar 31, 2017

IDNA Utils #274

IDNA Utils #274

Comments

indolering commented Mar 15, 2017 • edited Loading

Summary

indolering commented Mar 15, 2017

annevk commented Mar 15, 2017

rmisev commented Mar 15, 2017

annevk commented Mar 15, 2017

indolering commented Mar 15, 2017

indolering commented Mar 15, 2017

mikewest commented Mar 16, 2017

annevk commented Mar 16, 2017

indolering commented Mar 16, 2017

annevk commented Mar 31, 2017

sleevi commented Mar 31, 2017

indolering commented Mar 15, 2017 •

edited

Loading