Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can you find a website that uses a non-UTF-8 Unicode encoding? (Actually now that I think about it, UTF-16 probably makes sense for non-Latin languages and might well be common. Does anyone have any insights here?)


> Actually now that I think about it, UTF-16 probably makes sense for non-Latin languages

It actually doesn't because of all the HTML tags being ASCII.



ISO-8859-n were pretty common in Europe, because they were the default charset on windows. Just checked the web pages of two major danish newspapers: They use ISO-8859-1. Utf-8 is getting more widespread though.


They aren't the default windows charsets. Windows code pages tend to be close, with differences here and there that are impossible to sniff for.


ISO-8859-n is not Unicode.


ISO-8859-n is not ASCII either, so if your software is not Unicode-aware (or encoding-aware) at all, things will break.


They are considered unicode encodings.


No, they are not.


http://www.w3.org/TR/html4/charset.html considers ISO-8859-n character encodings:

"Commonly used character encodings on the Web include ISO-8859-1 (also referred to as "Latin-1"; usable for most Western European languages), ISO-8859-5 ..."


ISO-8859-5 has NEVER been a common encoding on the Web.

KOI8-R was, then Windows-1251. Now it's often UTF-8.


But Latin-1 is not a Unicode encoding.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: