As others have mentioned you have lots of options: LMDB, LevelDB/RocksDB, BerkeleyDB. For what it's worth, I spent a long time looking for an embedded key-value store for my current native project since I didn't need the full complexity of SQL. In the end I chose... SQLite.
All of these embedded NoSQL databases seem to be missing critical features. One such feature for my use case is database compaction. Last I checked, an LMDB database file can never shrink. Full compaction of LevelDB is slow and complicated (as I understand it essentially breaks the levels optimization which is the whole point of the thing.) SQLite meanwhile supports fast incremental vacuum, and it can be triggered manually or automatically.
SQLite just has everything. Plus the reliability is unmatched. Even if you just need a single table that maps blob keys to blob values, I would still recommend SQLite over any NoSQL database today.
Linux has VFS cache which is very robust and efficient.
Remember that it is typical for programs to read /etc/nsswitch.conf, /etc/resolve.conf and tons of others at startup time - the filesystem is the datasource in Unix tradition, so the machinery is very well optimized.
The problem with this is that if your records/documents are small, you're wasting huge amounts of space because each file uses a full filesystem block. If you have, say, ten thousand records where each is 200 bytes, a decent database would store that in a bit over 2MB. Storing these as individual files on a filesystem with 4kB blocks will take up at least 40MB. This is a huge amount of wasted space, not to mention slow. (Some filesystems do support tail packing but that won't fully solve the problem.)
Not to mention all the other problems with this. The filesystem has a complete lack of higher-level features: no transactions, no snapshots, no indexing beyond filenames, no easy robustness guarantees (doing fsync() properly is a lot more complicated than it appears.) Honestly for modern apps the filesystem is just terrible at storing any internal mutable app data.
Once you start writing code to store auxiliary indices, synchronize writes, or pack multiple records per file, well at that point you're just implementing your own database. This might make sense if, say, you have a special way of compressing your data (like git). But generally you're better off using a real embedded database.
For simple stuff I've had success keeping an in-memory data structure as the source of truth, and just persisting the whole thing to a file (JSON or otherwise) via a debounced function. Assuming you only have one process (or at least one main process), you only have to read the file on startup and can be pretty relaxed about your write strategy
You'll only be wasting space proportional to the number of objects, and the overhead is much smaller on ext4 (default fs on most linux distros) like the sibling comment explains. Most databases are quite small, and most of the rest are less than huge, so in most cases you won't be wasting huge amounts of space.
Overhead seems to be about 4KiB, at least using default ext4 parameters:
# zfs create -V 100G -b 4096 tank/test && mkfs.ext4 -v /dev/zvol/tank/test && mount /dev/zvol/tank/test /mnt && cd /mnt
# df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/zd16 102626232 24 97366944 1% /mnt
# for x in `seq 1000000`; do echo $x >$x; done # create 1M tiny files
# df -k .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/zd16 102626232 4022348 93344620 5% /mnt
# bc -l
(97366944-93344620)*1024/1000000 # free space diff per file
4118.859776
Plus you're wasting an inode per record - a limited resource in ext4, increasing which requires reformatting. You'd probably run out of inodes much sooner than out of space.
This has a large overhead - easily a few kilobytes per file, depending on filesystem. Poor locality - your data gets scattered across entire disk; even on SSD this matters to some extent. And also starts to break down in some ways already at O(100k) files, e.g. globs in command line stop working. You can work around by splitting files into different directories but at that point it's just easier to use a normal database.
Isn't the main feature of nosql supposed to be easy horizontal scalability, the exact opposite of storing everything in a single file?
If you just need a r/w store for some jsons in a single file, why not sqlite? You can put arbitrary-length blobs into it. Some sql will be involved but you can hide it in a wrapper class tailored to your application with a few dozen lines of code or so.
> Isn't the main feature of nosql supposed to be easy horizontal scalability
There's no strict definition of nosql so everyone can choose their own. My personal take (in broad terms) follows:
No, that's not a feature of nosql. Nosql means not relational, which in turn means no guarantees about the relationship between two objects, i.e. no atomicity of access across multiple objects (for either read or write operations). A consequence of this lack of atomicity is that it's easy to store different objects in different places, thus opening up opportunities for horizontal scalability. Caveat: those opportunities can be taken away by other choices you make. If you decide to offer and enforce transactions, you are bringing atomicity back into the system, and thus making horizontal scalability hard again. Or you may decide you want a nosql-in-a-file.
Practically, NoSQL also seems to just mean "Not-SQL" as stuff like Redis is often lumped in it, which is about the opposite of easy horizontal scalability.
I use it in many of my apps, e.g. https://github.com/rochus-keller/CrossLine. It's lean and fast, and supports objects, indices, hierarchical "globals" like ANSI-M and transactions.
NoSQL databases have many different different data models. Eg object, document, graph, and key/value DBs. In a lot of cases you should probably just use something on top of SQLite, but you should say more about your requirements.
I'd add Crux to this... it's document-oriented, but still generates attribute-level indices, and is also Datalog based. It can be used on top of RocksDB, LMDB, or many other (more scalable) backends (Kafka being the canonical one).
It's really awesome, and the team behind it are super responsive and helpful.
It's small yet capable. If you are familiar with MongoDB, you will feel right at home.
It's great for .NET developers as it's written in C# but since it's Netstandard 1.3 compatible, you can presumably run it under Ubuntu or Mac OS or wherever else the new .NET 5 runtime works. I've got a C# app running on ARM64 the other day - just saying.
I like sled that is a nice embedded key value store written in Rust: https://sled.rs/
However, it is still in heavy development and a bit of a moving target even if the developers are currently heading toward stabilization of the file format.
All of these embedded NoSQL databases seem to be missing critical features. One such feature for my use case is database compaction. Last I checked, an LMDB database file can never shrink. Full compaction of LevelDB is slow and complicated (as I understand it essentially breaks the levels optimization which is the whole point of the thing.) SQLite meanwhile supports fast incremental vacuum, and it can be triggered manually or automatically.
SQLite just has everything. Plus the reliability is unmatched. Even if you just need a single table that maps blob keys to blob values, I would still recommend SQLite over any NoSQL database today.