Hacker Newsnew | past | comments | ask | show | jobs | submit | creationix's commentslogin

yeah, LuaJIT is one of the use cases I had in mind working on this. JSON is pretty fast in modern JS engines, but in Lua land, JSON kinda sucks and doesn't really match the language without using virtual tables.

JSON has `null` values with string keyds, but lua doesn't have `null`. It has `nil`, but you can't have a key with a nil value. Setting nil deletes the key

Lua tables are unordered. But JS and JSON are often ordered and order often matters.

RX, however matches Lua/LuaJIT extremely well and should out-perform the JS Proxy based decoder using metatables. Since it's using metatables anyway do to the lazy parsing, it's trivial to do things like preserve order when calling `pairs` and `ipairs` and even including keys with associated null values.

You can round trip safely in Lua, which is not easy with most JSON implementations.


How does CBOR retain JSON compatibility more than RX?

RX can represent any value JSON can represent. It doesn't even lose key order like some random-access formats do.

In fact, RX is closer to JSON than CBOR.

Take decimals as an example:

JSON numbers are arbitrary precision numbers written in decimal. This means it can technically represent any decimal number to full precision.

CBOR stores numbers as binary floats which are appriximations of decimal numbers. This is why they needed to add Decimal Fractions (Tag 4)

RX already stores as decimal base and decimal power of 10. So out of the box, it matches JSON


Thanks for the feedback. I've improved the framing to make the purpose/value more clear. What do you think about "RX is a read-only embedded store for JSON-shaped data"?

https://www.npmjs.com/package/@creationix/rx


That benchmark is a fair comparison for a real-world production workload and use case. Sadly I can't share the details. But suffice it to say that the dataset is a huge object with tens of thousands of paths as keys and moderately large objects as values (averaging around 3KB of JSON each) all with slightly different shapes. The use is reading just a few entries by path an then looking up some properties within those entries.

The benchmark (or is supposed to) measures end-to-end parse + lookup.

JSON: 92 MB RX: 5.1 MB

Request-path lookup: ~47,000x faster

Time to decode a manifest and look up one URL path:

JSON: 69 ms REXC: 0.003 ms

Heap allocations: 2.6 million vs. 1

JSON: 2,598,384 REXC: 1 (the returned string)


the project framing needs some help perhaps. JSON is really good at a lot of use cases that this will never replace. But there are cases where JSON is currently used where this is much better. In particular large unstructured datasets where you only need to read a tiny subset of the data in a single request.

Maybe a better framing would be no-sql sqlite?


I'm happy to hear suggestions. This format was actually the internal .rexc bytecode for Rex (routing expressions), but when I realized it was actually a pretty good standalone format, I renamed it `.rx` for short. I am aware of RxJS, but I think that `rx-format` is different enough and `.rx` file extensions are unique enough, it's not too confusing.

You're right. Some important differences:

sick is binary, rx is textual (this matters for tooling)

sick has size limits (65534 max keys for example. I have real-world rx datasets reaching this size already) rx uses arbitrary precision variable-length b64 integers. There are no size limits anywhere inherit in the format, just in implementations.

sick does not preserve object key order rx preserves object key order, but still implements O(log2 N) lookups for object keys.

etc.


> Does this duplicate the name of keys?

Yes, the format allows for objects to be stored with a pointer to a shared schema (either an array of keys or another object that has the desired keys)

The current implementation is pretty close to ideal when deciding to use this encoding.


The current format version is the exact same feature set as JSON. I even encode numbers as arbitrary precision decimals (which JSON also does). This is quite different from CBOR which stores floats in binary as powers of 2.

I could technically add binary to the format, but then it would lose the nice copy-paste property. But with the byte-aware length prefixes, it would just work otherwise.


it's not really possible to stay human readable and get the compression levels and random access properties I was going for. But it is as human tooling friendly as possible given the constraints.

>it's not really possible

I find it obvious that your first attempt failed. Try again, you have not even remotely failed enough if you are making the argument that this is kinda readable. Yes, ascii words are easy to pick out, you didn’t do that, you did the part that makes it all harder.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: