Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDNA Utils #274

Open
indolering opened this issue Mar 15, 2017 · 11 comments
Open

IDNA Utils #274

indolering opened this issue Mar 15, 2017 · 11 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api topic: idna

Comments

@indolering
Copy link

indolering commented Mar 15, 2017

Ticket tracking discussion of restoring the URL.domainToASCII and URL.domainToUnicode functions or implementing something new.

Summary

Processing international domain name labels is tricky, slow, and requires large lookup tables. However, browsers already perform this task (typically using the ICU library) and could expose these functions to JavaScript.

The proposal to add this functionality was nixed because no major browser had implemented it. Node supports the call (<50 lines) and a WebKit developer chimed in saying it would be trivial to add.

One issue is which version of ToUnicode function should be exposed and whether there are other utility functions that might be needed, such as subdomain comparisons, distinguishing between domains/subdomains/TLD/public suffix, and IP address parsing.

@indolering
Copy link
Author

My two cents: I'm worried that attaching additional parsing to this feature will result in an implementation hazard.

I was thinking about how to implement domain parsing yesterday and it would be convenient if the URL object contained that information. However, I would assume that the frequent updates to the Public Suffix List would make it hard to maintain compatibility between browsers and versions.

The size and complexity of an IP parsing library is nowhere near that of Stringprep, Nameprep, and Punycode. But if it's easy to do, IP parsing is a reasonable request to make of the standard library for a web-centric programming language.

WRT to which version of ToUnicode ... make it configurable 🤷‍♀️?

@annevk
Copy link
Member

annevk commented Mar 15, 2017

@mikewest any new thoughts on all this?

I'm mainly asking the other questions since I wonder whether we should introduce a URLHost object rather than a couple of one-off utility methods.

(I don't think we want ToUnicode to be configurable. Each extra bit of API surface just leads to lots of bugs. Better to start out small.)

@rmisev
Copy link
Member

rmisev commented Mar 15, 2017

Also a host parser can be added to URLHost utility collection; see: #218 (comment) and #218 (comment)

@annevk
Copy link
Member

annevk commented Mar 15, 2017

const host = new URLHost(rawInput)
host.toString() // probably ASCII, as per usual
host.unicode() // ToUnicode?
host.type // "ipv4", "ipv6", "domain"

Alternatively you could make ToUnicode an argument to toString() somehow, similar to https://tc39.github.io/ecma262/#sec-number.prototype.tostring. Not sure if that's a precedent to follow however.

@indolering
Copy link
Author

I don't think we want ToUnicode to be configurable. Each extra bit of API surface just leads to lots of bugs. Better to start out small.

Well, which "version" of ToUnicode do we want, the standard ICU implementation or what the browser URL bar displays?

Alternatively you could make ToUnicode an argument to toString()

I just don't think that overloading host.toString() is appropriate because the Punycode/Nameprep transform is very specific to DNS.

@indolering
Copy link
Author

Thinking this over, I think it should output the standard ToUnicode function, as that's easier to standardize across environments (i.e. Node.js).

@mikewest
Copy link
Member

I do think something like this would be useful, and Node's implementation seems like a reasonable justification for paving the cowpath. If WebKit and Mozilla are also interested, I think Blink would follow suit.

That said, @sleevi had some concerns in #63 (comment). CCing him here.

@annevk
Copy link
Member

annevk commented Mar 16, 2017

@indolering note that there's no such thing as "standard" ToUnicode. I think we should be using https://url.spec.whatwg.org/#concept-domain-to-unicode which we already use in various places throughout the platform. I don't think we should expose variants, which I think was @sleevi's concern in that other thread. (Also note that our host parser is very specific to DNS already, since it already involves Punycode/Nameprep due to ToASCII which is applied on input.)

@indolering
Copy link
Author

note that there's no such thing as "standard" ToUnicode.

I'll take your word for it! It's my preference for a single implementation to be shared across browsers and Node. AFAIK, this isn't the case when it comes to what's displayed in the URL bar. But IDNA makes me go cross-eyed, so I'll stop inserting myself.

annevk added a commit that referenced this issue Mar 31, 2017
Also export host parser (already in use by HTML).

Fixes #274.
@annevk
Copy link
Member

annevk commented Mar 31, 2017

I created a PR for this since we've got interest now from WebKit and Chrome. I'm a little worried about all the incompatibilities we still have with IDNA, but those are also exposed in other ways already.

@achristensen07 I'd appreciate review of #288 from you since you said WebKit would be interested in something like this.

What should be done before landing:

  • Add examples. If anyone here is willing to contribute some, that'd be great!
  • Write web-platform-tests. Again, help appreciated. If anyone needs guidance, I'm happy to help.

@sleevi
Copy link

sleevi commented Mar 31, 2017

Yeah, I do want to echo the concerns, and I'll loop @mikewest onto some design docs he may not have been aware of when he expressed support :)

annevk added a commit that referenced this issue Jun 7, 2018
Also export host parser (already in use by HTML).

Fixes #274.
@annevk annevk added needs implementer interest Moving the issue forward requires implementers to express interest topic: idna addition/proposal New features or enhancements labels Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: api topic: idna
Development

No branches or pull requests

5 participants