Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Absolutely none of that is correct.


A lot of people seem to agree with you, but I believe the HTML 4 spec supports my point: http://www.w3.org/TR/html4/charset.html

HTML is defined to use Unicode as the document character set. But the charaters can be represented as byte-streams using different encodings, UTF-8 beeing one encoding, ISO-8859-1 beeing another encoding.

> The "charset" parameter identifies a character encoding, which is a method of converting a sequence of bytes into a sequence of characters.

A lot of people seem to confuse Unicode with the UTF-encodings.


> I believe the HTML 4 spec supports my point

That document is badly-written; a more reasonable way to interpret it is to conclude that user agents will use some form of Unicode internally, after converting whatever character encoding the document they received used. Which is, indeed, a very reasonable way to design your software, but it doesn't make Latin-1 (for example) a Unicode encoding by any reasonable standard.

> A lot of people seem to confuse Unicode with the UTF-encodings.

True. I do not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: