Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The point GP is making is that tars are "solid" archives, so compression can span files, which means putting files with lots of redundancy between them next to one another will yield much better compression ratios than orderings which are essentially random.

That is somewhat unintuitive to people used to zip archives, because zip contents are compressed independently, so the ordering really doesn't matter (at least to the compression ratio).



> because zip contents are compressed independently

I "discovered" this when compressing a collection of icon and wmf files in my youth (long before gif and svg were ubiquitous (in fact before svg was even invented)).

Because the files were fairly small without a lot of redundancy the resulting archive was not massively smaller than the input files. It did take a lot less space on-disk due to no longer taking at least 512 bytes (one allocation unit on a small FAT formatted partition) per file which was enough for my needs at that point but it would still be inconvenient if I wanted to transfer them over a 14k4 modem based link, but it seemed wrong so piqued my curiosity enough that I hunted out some info via Usenet and worked out what was going on. Recompressing the zip resulted in massive savings, because the headers in the files were very similar, identical in many cases, so the inner .zip acted like the .tar format in this discussion.

> so the ordering really doesn't matter

Unless you compress again, either as another compress-to-file or through compression in the transport method, in which case there might be significant extra savings to be made if things are in the optimal order.


Aah ok, that makes sense, sorry for the confusion. I didn't know zip worked that way, although in hindsight that explains why it's so simple and fast to extract and/or modify only a part of a .zip archive, even on old computers.


> although in hindsight that explains why it's so simple and fast to extract and/or modify only a part of a .zip archive, even on old computers.

That is also because zips have a « central directory », so to modify or add one of the files you just need to add a new record then update the central directory. If you don’t care too much for size you can literally just append them to the existing zip file and leave both the old record (for updates) and the old central directory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: