Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The utf-8 tricks make me very nervous since I have seen too many attacks with parser confusion. I for with serde for correctness not speed. I hope this was fuzzed all the way with a bunch of invalid utf-8 strings.


Luckily utf-8 structure is _very_ trivial compared to the average parser. Not to say there can't be bugs, but that the internal states of a parser shouldn't be large, and can be exhaustively tested.


This is the sort of space where I’d like to see a fuzzer.


Any bugs you can point to that come to mind of this class?


https://en.wikipedia.org/wiki/UTF-8#Invalid_sequences_and_er...

> Many of the first UTF-8 decoders would decode these, ignoring incorrect bits and accepting overlong results. Carefully crafted invalid UTF-8 could make them either skip or create ASCII characters such as NUL, slash, or quotes. Invalid UTF-8 has been used to bypass security validations in high-profile products including Microsoft's IIS web server[26] and Apache's Tomcat servlet container.[27] RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: