Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Makes me wonder how tar+zstd would perform for the StackOverflow user. They were using xz, which is hardly ancient junk.


>xz

It may not be ancient, but it's certainly not the greatest compression format out there. I found this article to be very informative:

https://www.nongnu.org/lzip/xz_inadequate.html

(The author worked on lzip so it's probably somewhat biased, but facts are facts.)


The article calls xz out for its inadequacy as a (long-term) container format, which is nothing to do with its compression method (shared by both xz and lzip). I guess in this thread we are specifically talking about the efficiency of compression algorithms...


> I guess in this thread we are specifically talking about the efficiency of compression algorithms

Yep, and on top of that xz is more efficient at compression than zstd despite being "older", which is the relevant consideration here. I'm not expecting zstd to compete with xz for the StackOverflow case in terms of efficiency, what I want to know is whether file ordering makes the same efficiency difference for zstd as it does for xz.


> I want to know is whether file ordering makes the same efficiency difference for zstd as it does for xz.

It should, because much like other LZ77-based formats zstd uses a distance symbol plus additional uncompressed bits for longer distances, which should be frequent with the unorganized file order. Those formats assume that lower bits of longer distances are essentially random, so if this assumption doesn't hold those bits will affect the efficiency. Zstd also recommends the minimum window size of 8 MB, which is an improvement but also not very large.

I've done a quick experiment with a directory hanging around my temporary folder that weighs about 14 MB (Android CellBroadcastReceiver source code, if you ask). The random file order resulted in a file about 1.4% larger than those for the file extension order. So the effect definitely still exists, though I'm not able to quantify that.


Most of the points that article makes are utterly irrelevant for most uses of xz, including this one.


xz is more than ten years old now :-).


In terms of standard Linux tools, that's hardly ancient.


Hell, it's barely gone from "bleeding edge" to mere "cutting edge".


Most tools do not become obsolete as quickly.

Unlinking a path is hardly something that sees cutting edge improvements, cryptographic hashing, filesystems, and compression is something that is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: