Ask HN: What is the SQLite of nosql databases?

ludocode · on June 13, 2021

As others have mentioned you have lots of options: LMDB, LevelDB/RocksDB, BerkeleyDB. For what it's worth, I spent a long time looking for an embedded key-value store for my current native project since I didn't need the full complexity of SQL. In the end I chose... SQLite.

All of these embedded NoSQL databases seem to be missing critical features. One such feature for my use case is database compaction. Last I checked, an LMDB database file can never shrink. Full compaction of LevelDB is slow and complicated (as I understand it essentially breaks the levels optimization which is the whole point of the thing.) SQLite meanwhile supports fast incremental vacuum, and it can be triggered manually or automatically.

SQLite just has everything. Plus the reliability is unmatched. Even if you just need a single table that maps blob keys to blob values, I would still recommend SQLite over any NoSQL database today.

justsomeuser · on June 13, 2021

Also there are high quality libraries for every language

creshal · on June 13, 2021

The SQLite of NoSQL is still SQLite: https://www.sqlite.org/json1.html

andrey_utkin · on June 13, 2021

Filesystem.

Linux has VFS cache which is very robust and efficient.

Remember that it is typical for programs to read /etc/nsswitch.conf, /etc/resolve.conf and tons of others at startup time - the filesystem is the datasource in Unix tradition, so the machinery is very well optimized.

ludocode · on June 13, 2021

The problem with this is that if your records/documents are small, you're wasting huge amounts of space because each file uses a full filesystem block. If you have, say, ten thousand records where each is 200 bytes, a decent database would store that in a bit over 2MB. Storing these as individual files on a filesystem with 4kB blocks will take up at least 40MB. This is a huge amount of wasted space, not to mention slow. (Some filesystems do support tail packing but that won't fully solve the problem.)

Not to mention all the other problems with this. The filesystem has a complete lack of higher-level features: no transactions, no snapshots, no indexing beyond filenames, no easy robustness guarantees (doing fsync() properly is a lot more complicated than it appears.) Honestly for modern apps the filesystem is just terrible at storing any internal mutable app data.

Once you start writing code to store auxiliary indices, synchronize writes, or pack multiple records per file, well at that point you're just implementing your own database. This might make sense if, say, you have a special way of compressing your data (like git). But generally you're better off using a real embedded database.

inshadows · on June 13, 2021

Some filesystem can store small data directly in inode. Limit for ext4 is 160 bytes though.

https://unix.stackexchange.com/questions/197570/is-it-possib...

https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Inli...

rektide · on June 13, 2021

btrfs has this too. it'd be cool if dirent_t could those this so one could quickly iterate thought these inline data.

https://www.gnu.org/software/libc/manual/html_node/Directory...

brundolf · on June 13, 2021

For simple stuff I've had success keeping an in-memory data structure as the source of truth, and just persisting the whole thing to a file (JSON or otherwise) via a debounced function. Assuming you only have one process (or at least one main process), you only have to read the file on startup and can be pretty relaxed about your write strategy

fulafel · on June 13, 2021

You'll only be wasting space proportional to the number of objects, and the overhead is much smaller on ext4 (default fs on most linux distros) like the sibling comment explains. Most databases are quite small, and most of the rest are less than huge, so in most cases you won't be wasting huge amounts of space.

quantumofalpha · on June 14, 2021

Overhead seems to be about 4KiB, at least using default ext4 parameters:

  # zfs create -V 100G -b 4096 tank/test && mkfs.ext4 -v /dev/zvol/tank/test && mount /dev/zvol/tank/test /mnt && cd /mnt
  # df -k .
  Filesystem     1K-blocks  Used Available Use% Mounted on
  /dev/zd16      102626232    24  97366944   1% /mnt
  # for x in `seq 1000000`; do echo $x >$x; done  # create 1M tiny files
  # df -k .
  Filesystem     1K-blocks    Used Available Use% Mounted on
  /dev/zd16      102626232 4022348  93344620   5% /mnt
  # bc -l
  (97366944-93344620)*1024/1000000  # free space diff per file
  4118.859776

Plus you're wasting an inode per record - a limited resource in ext4, increasing which requires reformatting. You'd probably run out of inodes much sooner than out of space.

fulafel · on June 14, 2021

Interesting, indeed the inline data option is not enabled even by the latest e2fsprogs even though the feay has been there a long time.

Re inodes, this is a good point too. These definitely reduce the size of db that fs works nicely for.

quantumofalpha · on June 13, 2021

This has a large overhead - easily a few kilobytes per file, depending on filesystem. Poor locality - your data gets scattered across entire disk; even on SSD this matters to some extent. And also starts to break down in some ways already at O(100k) files, e.g. globs in command line stop working. You can work around by splitting files into different directories but at that point it's just easier to use a normal database.

quantumofalpha · on June 13, 2021

Isn't the main feature of nosql supposed to be easy horizontal scalability, the exact opposite of storing everything in a single file?

If you just need a r/w store for some jsons in a single file, why not sqlite? You can put arbitrary-length blobs into it. Some sql will be involved but you can hide it in a wrapper class tailored to your application with a few dozen lines of code or so.

Jare · on June 13, 2021

> Isn't the main feature of nosql supposed to be easy horizontal scalability

There's no strict definition of nosql so everyone can choose their own. My personal take (in broad terms) follows:

No, that's not a feature of nosql. Nosql means not relational, which in turn means no guarantees about the relationship between two objects, i.e. no atomicity of access across multiple objects (for either read or write operations). A consequence of this lack of atomicity is that it's easy to store different objects in different places, thus opening up opportunities for horizontal scalability. Caveat: those opportunities can be taken away by other choices you make. If you decide to offer and enforce transactions, you are bringing atomicity back into the system, and thus making horizontal scalability hard again. Or you may decide you want a nosql-in-a-file.

gary_0 · on June 13, 2021

> You can put arbitrary-length blobs into it.

And it's quite suitable for this purpose: https://www.sqlite.org/fasterthanfs.html

etaioinshrdlu · on June 13, 2021

Practically, NoSQL also seems to just mean "Not-SQL" as stuff like Redis is often lumped in it, which is about the opposite of easy horizontal scalability.

Rochus · on June 13, 2021

SQLite has a backend which is well suited as a key-value store.

Here is a NoSql database based on the SQLite backend: https://github.com/rochus-keller/Udb.

I use it in many of my apps, e.g. https://github.com/rochus-keller/CrossLine. It's lean and fast, and supports objects, indices, hierarchical "globals" like ANSI-M and transactions.

fulafel · on June 13, 2021

NoSQL databases have many different different data models. Eg object, document, graph, and key/value DBs. In a lot of cases you should probably just use something on top of SQLite, but you should say more about your requirements.

An interesting one I ran into recently is Datalevin, a Datalog DB on top of LMDB for Clojure: https://github.com/juji-io/datalevin

tincholio · on June 13, 2021

I'd add Crux to this... it's document-oriented, but still generates attribute-level indices, and is also Datalog based. It can be used on top of RocksDB, LMDB, or many other (more scalable) backends (Kafka being the canonical one).

It's really awesome, and the team behind it are super responsive and helpful.

teekay · on June 13, 2021

One of those I've tried is LiteDB - https://github.com/mbdavid/LiteDB. I liked it.

It's small yet capable. If you are familiar with MongoDB, you will feel right at home.

It's great for .NET developers as it's written in C# but since it's Netstandard 1.3 compatible, you can presumably run it under Ubuntu or Mac OS or wherever else the new .NET 5 runtime works. I've got a C# app running on ARM64 the other day - just saying.

I wrote about my experience playing with LiteDB here - https://tomaskohl.com/code/2020-04-07/trying-out-litedb/. It's not an in-depth look at all, just a few notes from the field, so to speak.

erk__ · on June 13, 2021

Probably something like LMDB [0] or Tkrzw [1], though nosql is a bit more diverse in a way SQL is not so it is hard to give a clear answer.

[0]: https://en.m.wikipedia.org/wiki/Lightning_Memory-Mapped_Data... [1]: https://dbmx.net/tkrzw/

eqvinox · on June 13, 2021

+1 for LMDB, ended up there researching simple local key-value stores for my own use.

GDBM and BerkeleyDB are the "grey beard" references, but their 80's & 90's heritage shows.

adamansky · on June 13, 2021

Take a look on the ejdb2 https://ejdb.org

maxk42 · on June 13, 2021

Take a look at UnQLite: https://unqlite.org/

kenOfYugen · on June 13, 2021

I ended up writing a small wrapper on top of SQLite based on this: https://dgl.cx/2020/06/sqlite-json-support

With proper concurrency control, it can work very well even for multi process applications.

xiphias2 · on June 13, 2021

BerkerleyDB is an older, simpler one, LevelDB / RocksDB are more modern, maintained, better for SSD workload

Rochus · on June 13, 2021

BerkerleyDB is rather complex and has a dual license (AGPL and commercial).

Tpt · on June 13, 2021

I like sled that is a nice embedded key value store written in Rust: https://sled.rs/

However, it is still in heavy development and a bit of a moving target even if the developers are currently heading toward stabilization of the file format.

throwaway888abc · on June 13, 2021

Check LiteStore as bonus you have api included

https://h3rald.com/litestore/

https://github.com/h3rald/litestore

1vuio0pswjnm7 · on June 13, 2021

https://en.wikipedia.org/wiki/Cdb_(software)

Blackthorn · on June 13, 2021

tkrzw is basically the modern dbm / berkeleydb

oofabz · on June 13, 2021

I second this. For those who don't know, tkrzw is the less-memorably-named successor of Tokyo Cabinet and Kyoto Cabinet.

cpach · on June 13, 2021

How is one supposed to pronounce “tkrzw”? (^_^)

Tango Kilo?

abrookewood · on June 13, 2021

I think you want Mongita. It was featured on HN a while ago: "Mongita is to MongoDB as SQLite is to SQL (github.com/scottrogowski)" https://news.ycombinator.com/item?id=26881915

quickthrower2 · on June 13, 2021

An object reference :-)

In memory, high performance, no schema. Get the object to journal to disk and you are almost there!

tutlane · on June 16, 2021

Its same as sqlite. You can get more info in this sqlite tutorial.

https://www.tutlane.com/tutorial/sqlite

pipework · on June 17, 2021

Hey sureshdasari, this is probably against the rules for advertising.

pistoriusp · on June 13, 2021

Json on the filesystem

Abishek_Muthian · on June 13, 2021

There are also persistent key-value store like dbm as part of standard library in python and several 3rd party implementations like bitcask for Go.

diehunde · on June 15, 2021

You mean an embedded NoSQL database? Because all databases whether SQL or NoSQL are file-based with just different formats and structures.

nicodds · on June 13, 2021

Personally, I like very much leveldb/rocksdb. They're very fast and solid.

dvfjsdhgfv · on June 13, 2021

My guess is that although you basically describe BerkeleyDB you probably want Redis.

RustyRussell · on June 13, 2021

I'd nominate TDB (the trivial database). It's small, robust, effective

CRConrad · on June 14, 2021

Mostly, but actually not totally, kidding: *.INI files.

jf22 · on June 14, 2021

RavenDb would be what I would use.

amachefe · on June 13, 2021

For now, see Postgres as one

basiclaser · on June 13, 2021

voodoochilo · on June 13, 2021

https://unqlite.org/