This system seems pretty weird to me. I was wondering, can that clash with a "no...

xg15 · on March 17, 2024

IDNs and Punycode were basically bolted-on extensions to DNS that were added after DNS was already widely deployed. Because there was no "proper" extension mechanism available, it was a design requirement that they can be implemented "on top" of the standard DNS without having to change any of the underlying components. So I think most of the DNS infrastructure can be (and is) still completely unaware that IDNs and Punycode exists.

Actually, I wonder what happens if you take a "normal" (i.e. non-IDN, ascii-only) domain and encode it as Punycode. Should the encoded and non-encoded domains be considered identical or separate? (for purposes of DNS resolutions, origin separation, etc)

Identical would be more intuitive and would match the behavior of domain names with non-ascii characters - on the other hand, this would require reworking of ALL non-punycode-aware DNS software, which I'm doubtful is possible.

So this seems like a tricky thing to get right.

Doxin · on March 18, 2024

Python has idna-encoding built in these days, so I figured I'd do a quick check to see what happens:

    >>> "foo".encode("idna")
    b'foo'
    >>> "fooé".encode("idna")
    b'xn--foo-dma'

So indeed a punycode'd ascii domain would remain unchanges by the looks of it.

There's also the "punycode" encoding available, but that does something subtly different that's not quite how domains get encoded:

    >>> "foo".encode("punycode")
    b'foo-'
    >>> "fooé".encode("punycode")
    b'foo-dma'

ttepasse · on March 19, 2024

According to the current Python documentation the 'idna' encoding in Python only does IDNA 2003, not IDNA 2008:

https://docs.python.org/3.12/library/codecs.html#module-enco...

The recommend the 3rd party 'idna' module for this:

https://pypi.org/project/idna/

IDNA 2003 is a particular annoyance of mine: The IDNA 2003 algorithm didn't encode the german 'ß' character, or rather 'wrongly', through overeager use of Unicode normalisation in the nameprep part. Then the browser makers for a long time stood still and didn't upgrade to IDNA 2008, which fixed that bug among other things. The WhatWG in its self-appointed role as stenograph of the browser cartel didn't change its weird URL spec. But that seems to have changed in recent years. Of course the original sin of IDNA was making it client-side. :/

lathiat · on March 17, 2024

I mean, yeah, but the odds of someone using "xn--" on the start of a domain are pretty small. The double dash is pretty uncommon.