The sqlite file format spec (https://www.sqlite.org/fileformat.html) is fairly short. I looked at hexdumps of where the db header should be, and a few page headers, and it seemed to line up, but not everything was filled in correctly. Trying to patch the header fields to match the sqlite spec and running it through sqlite3 didn't work, so I figured there was some more structural differences.
Since the spec is pretty short, I wrote a small parser for the sqlite files, printing out hexdumps of various sections as I went to check against the spec. The cell pointers in the btree leaf pages were missing, so I just tried reading them out in order from the cell data section and that worked (the pointers are really just an optimization to skip over cells when scanning). The table schemas are stored differently, but its a text format so its pretty easy to read (comma-separated columnName:type).
The record format was the most different part. Instead of type-tagged values like in sqlite, these records just had the column count and offsets of values. The offsets can be 1 or 2 bytes depending on the length of the record, which tripped me up for a bit. This was mostly figured out by looking at hexdumps of records. The values have to be interpreted based on the schema of the table the record is in.
The file format used little-endian for the record values, despite everything else in the file being big-endian. The schema in these files allowed array-of-vals and array-of-structs column types that I haven't decoded yet, but they don't seem to be on any of the important tables, mostly file metadata tables. There are some untyped binary columns containing an array of 64-bit floats, which was found by comparing against the csv of the same data.
Not counting some of the binary columns which I haven't decoded yet, the alternate version of the various sections were still pretty simple, so I was able to figure them out by eyeballing it. There weren't any numeric enums (outside of the ones in the sqlite spec) or flag masks that would have taken a while to decipher.
I had been working on this myself! I haven’t dug into SQLite to this level before, but found the header easy to correct. I got stumped when eyeballing the additional pages of information.
I really enjoyed learning more about b-trees and binary encoding formats taking this on. My primary exposure to those concepts has come from reading Designing Data-intensive Applications — so having something concrete to work on to test that knowledge was fun.
One suggestion for the founders, don't start the timer until the code has loaded, and stop the timer once you submit. I lost 10-15 seconds to loading the code and then got game over because i submitted with 10 seconds left and it took longer than that to actually send.
Yeah.. I got the task when the timer had 0:20 left, fixed it in 2 seconds, hit submit, it said "submitting..." until the timer ran down to 0:00 and then just "Game over". :-/
Yeap, great point. We currently sync the time with the server regularly to avoid cheating and given that the servers are under HN rush right now the delay is probably way more significant than it usually is. However, we are trying to figure out a better way to do this to make sure it doesn't start counting before the task loads.
Could you also start the timer after the player has acknowledged they've read the problem? At the moment you're unfairly disqualifying players who have reading difficulties (English isn't their first language, reading difficulties like dyslexia, or even those with partial eyesight so also read slower).
Timing the person's ability to read English doesn't prove their ability to write Javascript et al.
Not with dyslexia, no. Because the problem lies with the syntax and semantics of the language and since programming languages are logic based rather than evolved from grunts with complex non-obvious rules, many people with learning difficulties (including myself) find it massively easier to parse code than English.
To use myself as an example, it took me around 30 seconds just to read and understand the first challenge, yet I was able to solve it in around 10 seconds after that despite it being a Javascript language (which is probably the weakest of the dozen or so languages I code in). If I was to go head to head with someone, that 30 second read time would be a hell of a handicap.
As for poorer eyesight, yes you'd have a point there, but at least the playing field would be levelled out a little more.
Interesting, I did not know that. I thought most of the problems with dyslexia came from actually being able to read series of letters and numbers - I've heard it described as letters coming together in a jumble, and not in actual order that they appear in a word
That can be an issue in more severe cases, but even in those instances most programming languages aren't too verbose in their syntax that you cannot understand the logic process. Plus indentation obviously helps a lot there too.
The sqlite file format spec (https://www.sqlite.org/fileformat.html) is fairly short. I looked at hexdumps of where the db header should be, and a few page headers, and it seemed to line up, but not everything was filled in correctly. Trying to patch the header fields to match the sqlite spec and running it through sqlite3 didn't work, so I figured there was some more structural differences.
Since the spec is pretty short, I wrote a small parser for the sqlite files, printing out hexdumps of various sections as I went to check against the spec. The cell pointers in the btree leaf pages were missing, so I just tried reading them out in order from the cell data section and that worked (the pointers are really just an optimization to skip over cells when scanning). The table schemas are stored differently, but its a text format so its pretty easy to read (comma-separated columnName:type).
The record format was the most different part. Instead of type-tagged values like in sqlite, these records just had the column count and offsets of values. The offsets can be 1 or 2 bytes depending on the length of the record, which tripped me up for a bit. This was mostly figured out by looking at hexdumps of records. The values have to be interpreted based on the schema of the table the record is in.
The file format used little-endian for the record values, despite everything else in the file being big-endian. The schema in these files allowed array-of-vals and array-of-structs column types that I haven't decoded yet, but they don't seem to be on any of the important tables, mostly file metadata tables. There are some untyped binary columns containing an array of 64-bit floats, which was found by comparing against the csv of the same data.
Not counting some of the binary columns which I haven't decoded yet, the alternate version of the various sections were still pretty simple, so I was able to figure them out by eyeballing it. There weren't any numeric enums (outside of the ones in the sqlite spec) or flag masks that would have taken a while to decipher.