That's too bad. The subvolume [0] features were an interesting paradigm. Kind of...

rleigh · on Aug 2, 2017

It's certainly interesting, but if you look at the ZFS design they were inspired by, they got a lot wrong. Some points to consider:

With ZFS, you have a hierarchy of datasets. These inherit properties from their parents, and while the mountpoints can also mimic this hierarchy, the mountpoint property can be set independently. Btrfs couples the two concepts, forcing subvolumes to be in a specific place in the actual filesystem; zfs datasets in comparison are purely metadata and are for organisation and administration, not direct use in the filesystem hierarchy.

ZFS snapshots are read-only, and clones of these snapshots are datasets in the hierarchy. Btrfs snapshots are read-write by default, which in some ways defeats the point of a point-in-time snapshot. You can also make changes to a ZFS clone and later promote it to replace the original dataset. Likewise rollbacks. Btrfs makes no provision for doing either; you have to delete the original and then rename the snapshot, which isn't atomic. ZFS' metadata preserves all relations between datasets, snapshots and clones.

The ZFS way of doing things makes things safe and accessible for system administration. There's no way to confuse the origin of a snapshot because it's tied to a parent dataset. Likewise clones of snapshots, unless you deliberately choose to break the link. The Btrfs way looks superficially nicer, but in practice is much less flexible, and potentially more dangerous since you don't have the ability to audit what came from where and when. Btrfs snapshot performance is also abysmal. ZFS handles snapshots simply by recording the transaction ID, which makes them really lightweight (and it also provides "bookmarks" which are even lighter weight). ZFS keeps the referenced blocks in deadlists, and its performance is excellent (compare how fast snapshot deletion is between the two). ZFS also allows delegating permissions to perform snapshot, clone, rollback etc. to normal users; I'm unware of Btrfs allowing such delegation--some operations can be performed like snapshotting, but not deletion, while ZFS permits this all to be configured transparently.

tylerjd · on Aug 2, 2017

Are you trying to say that BTRFS is supposed to compete feature-to-feature with ZFS? It's not. https://lwn.net/Articles/342892/

>I had a unique opportunity to take a detailed look at the features missing from Linux, and felt that Btrfs was the best way to solve them.

>From other points of view, they are wildly different: file system architecture, development model, maturity, license, and host operating system, among other things

-------------------------------

>Btrfs snapshots are read-write by default, which in some ways defeats the point of a point-in-time snapshot.

Yes, and have the option of being read only for your temporal "in place" snapshots. But if I want to clone a container for instant use (as LXC or Docker does), then the RW snapshots make sense. Btrfs doesn't make a distinction between a Clone and Snapshot, they are one and the same with a flag.

> but in practice is much less flexible

Tell me more how I can mix disks of differing size in RAID on ZFS

> There's no way to confuse the origin of a snapshot because it's tied to a parent dataset

There's no confusing to the origin of my sanpshots. `btrfs subvolume list -q` shows the ancestral parent as well as the subvolume it's located in, example:

  ID 6442 gen 50527 top level 751 parent_uuid 0f4442f8-6363-6944-be8d-e2b45d809352 path .snapshots/321/snapshot

> some operations can be performed like snapshotting, but not deletion

See user_subvol_rm_allowed mount option, available since Kernel 3.0

It's like comparing a car and a truck, they both have four wheels, transport passengers and cargo, and have an engine. Just because a truck runs on diesel does not make the fact that the car running on gas "wrong". Due to its fundamentally different implementation, the way the filesystem works is also different.

Yes ZFS has many more features, has been in development longer, and probably more "production ready" than BTRFS. But ZFS is not GPL compatible. And BTRFS doesn't require it's own separate cache that is apart from the normal filesystem cache.

cryptonector · on Aug 2, 2017

Yes, that's what rleigh is saying. It's what I'm saying.

ZFS sets a very very high bar indeed. There are things that could be done better (I've talked about some of those on HN). But pound for pound, it's the best storage stack today and has been for over a decade. ZFS is the benchmark against which all others are to be stacked. There will be applications for which you will find a more performant solution, maybe, but altogether, ZFS has been the last word in filesystems for a long time now.

The most interesting competition, IMO, is from HAMMER. We'll see how that progresses.

rleigh · on Aug 3, 2017

> Are you trying to say that BTRFS is supposed to compete feature-to-feature with ZFS?

Not entirely. Btrfs was designed with benefit of hindsight, so one would expect for the features they did choose to implement, that they would be superior in both design and implementation. Sadly, neither are the case except for a few minor exceptions.

> Btrfs doesn't make a distinction between a Clone and Snapshot, they are one and the same with a flag.

Yep, and this is one design choice which on the face of it is straightfoward and convenient, but has the side effect of being very inefficient. Because ZFS snapshots are owned by the dataset, AFAIK there's little refcounting overhead; you're just moving blocks to deadlists based on simple transaction ID number comparisons. If you modify a block and its transaction ID is greater than the latest snapshot, you can dispose of it, otherwise you add it to the snapshot deadlist (and also add the new updated block). If you delete a snapshot, you do the same thing: for each block, if the block transaction ID is later than the transaction ID of the previous snapshot, you dispose of it, else you move it to the previous snapshot's deadlist. No refcounting changes except to decrement for disposal. You only start paying the overhead when you create a clone. This makes ZFS snapshots very cheap, and clones a bit more expensive. Btrfs is always expensive as far as I understand.

Your particular uses might not take advantage of this, but it's something to bear in mind.

> Tell me more how I can mix disks of differing size in RAID on ZFS

You can have pools with vdevs of different sizes (I have one right here). It doesn't make sense to have different sizes within a vdev.

The need for cobbling together different sized discs appears to mainly be something needed for tinkering and testing. No one is going to care about this for production systems. It's a neat feature which few people care about in practice. I'd rather they had spent the time on making the basic featureset reliable.

> > some operations can be performed like snapshotting, but not deletion > See user_subvol_rm_allowed mount option, available since Kernel 3.0

Nice to see some option for this. It's better than nothing, but it's not really equivalent. ZFS has a fine-grained permissions delegation system which is inherited through dataset relationships, rather than coarse capabilities.

> And BTRFS doesn't require it's own separate cache that is apart from the normal filesystem cache.

Not a particular concern for me; it's well integrated on FreeBSD, and it's not a problem in practice on Linux nowadays IME. Do you have a specific problem with the ARC?