Point is ceph & friends have a lot of overhead. Example in ceph: by default, a file in S3 layer is split into 4MB chunks, and each of those chunks is replicated or erasure-coded. Using the same erasure coding as wasabi,b2-cloud, which is 16+4=20 (or 17+3=20), each of those 4MB chunks is split into 20 shards of ~200KB each. Each of those shards ends up having ~512B to 4KB of metadata.
So from 10KB to 80KB of metadata for single 4MB chunk.
In the video they mention storing everything in regular files on the filesystem. A regular filesystem would have inode overhead as well. XFS by default has 512 byte inodes (it can be more if you format it with bigger inodes, like you would for Ceph's Filestore backend).
For a lot of workloads Ceph's default erasure coding scheme (and Bluestore) would still be a lot more efficient than mirroring a file on top of a regular filesystem.
> For a lot of workloads Ceph's default erasure coding scheme (and Bluestore) would still be a lot more efficient than mirroring a file on top of a regular filesystem.
Yes that's correct, it's why Bluestore was created in the first place.
Point is ceph & friends have a lot of overhead. Example in ceph: by default, a file in S3 layer is split into 4MB chunks, and each of those chunks is replicated or erasure-coded. Using the same erasure coding as wasabi,b2-cloud, which is 16+4=20 (or 17+3=20), each of those 4MB chunks is split into 20 shards of ~200KB each. Each of those shards ends up having ~512B to 4KB of metadata.
So from 10KB to 80KB of metadata for single 4MB chunk.