JSZip: Create, read and edit .zip files with JavaScript

sheetjs · on April 20, 2014

We use jszip for parsing xlsx/xlsm/xlsb files in the browser (Excel 2007+ files are zip files that contain XML or binary files in specific locations): https://github.com/SheetJS/js-xlsx

JSZip works well for small files, but unzipping XLSB files larger than 50M seem to cause out of memory issues in Firefox

stu_k · on April 20, 2014

This project is great, because it's something I never envisioned for the library!

You may want to take a look at the master branch, because we've recently updated our inflate/deflate implementation to use the much faster Pako, https://github.com/nodeca/pako

TazeTSchnitzel · on April 20, 2014

Have you considered zip.js instead? It claims to work with files up to 4GB. It also can zip large files without any issues for me in Firefox.

sheetjs · on April 20, 2014

When we first evaluated options, zip.js only supported IE9+ (and IE6-8 were still relevant browsers) while JSZip supports IE6+. In fact, the Base64 text box was introduced in http://oss.sheetjs.com/js-xlsx/ specifically to test IE6-9 (which do not support the HTML5 File API)

TazeTSchnitzel · on April 20, 2014

I see. Inevitably its support for older browsers makes it slower, however.

allthatisgold · on April 20, 2014

I tried JSZip not too long ago for my side project and it was quite slow. I've been using http://gildas-lormeau.github.io/zip.js/ and I'm very happy with it. I hope to release my web app some time this week.

TazeTSchnitzel · on April 20, 2014

zip.js seems to be the better of the two. It uses typed arrays, so it's performant and I can zip large (140MB) files quickly and without hanging the browser.

artellectual · on April 20, 2014

Actually, one of the best use case for this is, when a user wants to upload multiple images / files, we can use the html5 file api with zip js generate 1 zip file and upload just that one file and extract on the server. It will be much faster. And actually you can use this in the web worker api so it doesn't block the Ui thread

AmblingAvocado · on April 20, 2014

Potential use case: instead of using CSS sprite maps (putting all of your images into one image to reduce the number of http requests generated by your page, then using css magic to select regions inside of that image), image files could be zipped into an image package that is delivered to the client, who unzips it and uses the images inside. This would cut down on the number of requests made, but allow the images to be used as normal images instead of as images within a sprite sheet.

moron4hire · on April 20, 2014

The crazy thing: in this thread, you are all right. The state of sending multiple files sucks. ZIP is the closest thing we have to a universal standard, if only because Windows refuses to ship with support for anything that is meaningful.

zo1 · on April 20, 2014

I seem to recall them getting a big fat law suit when they tried to bundle things with their OS. Perhaps they don't bundle things anymore for a reason?

TazeTSchnitzel · on April 20, 2014

Well, is there anything better for Windows, aside from maybe cabinet files?

chadscira · on April 20, 2014

i'm surprised we don't have a cross-browser webarchive concept yet... especially with the emphasis on mobile.

dsl · on April 20, 2014

We have a fantastic one based on MIME that is usable by everything except Firefox (however you need to enable some experimental stuff in Chrome).

http://en.wikipedia.org/wiki/MHTML

TazeTSchnitzel · on April 20, 2014

Until now I always thought the M stood for Microsoft. Now I know.

chadscira · on April 20, 2014

i wasn't aware of this, very cool

stu_k · on April 20, 2014

There is definitely work towards this. This spec was recently released and has some potential: https://github.com/w3ctag/packaging-on-the-web

jahitr · on April 20, 2014

Some images formats already come with compression. Also, zip doesn't work with well with binary data. Using CSS to "subscript" into the sprite is a better solution.

AmblingAvocado · on April 20, 2014

It's not about saving bandwidth or space by compressing the images - it's about removing the step of combining your images into one big image then using coordinates inside of that image to find your original images.

Sure, you can build the spriting into your build process, but I don't know how to do that (I'm sure there are tools) and you can't exactly use normal css since the coordinates of your images may change within the sprite sheet.

Using the zip method saves you some of that headache/overhead with the tradeoff of having to package your images into a zip and unzip them on the client side.

deathcap · on April 20, 2014

> Using the zip method saves you some of that headache/overhead with the tradeoff of having to package your images into a zip and unzip them on the client side.

I wrote an NPM module recently using this exact technique, https://github.com/deathcap/artpacks (for use in the browser with browserify). Had a bunch of small textures, originally I was requesting each individually in their own HTTP request, but zipping them up and extracting in the browser led to a fairly significant improvement in latency.

Packing all of the small textures into one big image and slicing it up at runtime was also an option I considered. In fact, since I'm using these textures with WebGL they actually are packed into one large texture atlas before uploading to the GPU, then indexed at runtime using UV coordinates. So one could stitch together all the textures beforehand, distributing a prebuilt texture atlas as a single file over HTTP — this may slightly improve performance, but it has another disadvantage.

Flexibility. For my purposes, it makes more sense to build up the atlas dynamically at runtime, since you might not know exactly what textures are needed and when (due to the nature of my application). Also, I wanted to support "cascading" textures, where multiple packs are loaded each providing possibly a subset of all textures, and the first pack with a matching texture takes priority. With unzipping at runtime in the browser, this technique was very easy to implement (without the latency cost of individual texture file requests).

And for compatibility with img src, and other HTML file references, I just convert the unzipped file to an HTML5 Blob then request its 'blob:' URI. 'data:' URIs would also work, but blobs are widely supported by modern browsers (unlike 'filesystem:' URLs, for the Web Filesystem API supported by Chrome) and don't need to encode the full file contents in the URL. The complete process, including unzipping, matching, blobbing, all happens fairly fast, have not ran into any noticeable performance issues.

(Note: technically I'm not using Stuart Knightley's JSZip, but Kris Kowal's zip module https://www.npmjs.org/package/zip - I can't recall why as they are both available on NPM, but its the same idea).

MatmaRex · on April 20, 2014

You can embed the images in the CSS as data: URIs, preferably automatically as part of the build process, and get the best of both worlds.

TheZenPsycho · on April 20, 2014

Except that the primary utility of zip is usually not its compression. It's the combining of multiple files into one "file". Basically what tar was invented for, but with actual universal support on every OS.

chadscira · on April 20, 2014

seems like multipart http responses may be pretty close to what we want

TheZenPsycho · on April 20, 2014

a multipart http response saves what kind of file?

csomar · on April 20, 2014

So is the point to simplify the process or something else?

CSS Sprites are very easy to make and use. You just need to set the right coordinates for the background-image.

On the other hand, running a complicated JS library and building an infrastructure to extract these images seems like an overkill for something that CSS sprites can do so easily.

TazeTSchnitzel · on April 20, 2014

CSS Sprites have numerous disadvantages that normal image files lack.

magicalist · on April 20, 2014

I'm surprised that no one has brought up that this is also solved at the protocol level with SPDY and eventually HTTP 2 (and by HTTP pipelining, which is sadly never going to happen). Not a silver bullet, but a bit more sane than unzipping a set of images in JS.

bhouston · on April 20, 2014

We've used https://github.com/EvanOxfeld/node-unzip/issues in production and we've had a surprising number of corrupt zip files created by it. We've switched to using a barely wrapped zip command line tool.

Is PSZIp better?

ff7c11 · on April 20, 2014

It's really fast at reading zip files created by Google takeout. I use it at http://theyhaveyour.info as it works perfectly with the FileReader API. For reading only though - haven't tried writing with it.

moron4hire · on April 20, 2014

Holy carp on a stick, that was so easy. You've made my day today.

BTW, here is how I make files download with the filename that I want: https://github.com/capnmidnight/JWD/blob/master/html5/deskto...

The call to the "a" function on line 8 is just generating an HTML anchor tag. You should be able to figure it out from there.

ragecore · on April 20, 2014

Why not unrar and untar using javascript as well?

https://github.com/varunmayya/bitjs

napoleond · on April 20, 2014

If you're purely interested in client-side compression in the browser (i.e. compatibility with server libraries/filesystem tools is not important) it's possible to get better compression in less time using http://pieroxy.net/blog/pages/lz-string/index.html

moron4hire · on April 20, 2014

This is exactly a project I was about to embark upon to generate ePub files from client-side JS. Thanks!

davidw · on April 20, 2014

Interesting - can I ask what you're working on? I'd be a bit worried about potentially memory issues with image-heavy ePubs.

moron4hire · on April 20, 2014

A minimalist writing tool to train beginner novelists on completing writing projects: https://www.justwritedammit.com/#main/about

I'm specifically targeting novelists (at least, right now), so pictures and complex layouts shouldn't be an issue. I want it to be something you can easily pick up, plod away through, maybe pay someone to do some editing for you, and then walk you through pushing it out on the Amazon Kindle Store.

nawitus · on April 20, 2014

Does this handle very large files on Node.js? npm for one fails with large files, by "large" I mean something like a few hundred megabytes. It's very problematic especially on Windows.

klunger · on April 20, 2014

Am I misunderstanding this, or could this be used to send a client a large amount of data in a zip file? Let's say the client is using WebView...

_ofdw · on April 20, 2014

>JavaScript today is capable of generating a lot of data. The easiest way to deliver multiple files to your users is in a zip file. Instead of wasting server resources and bandwidth you can get the client to do it for you.

... Am I not understanding what they're saying here or do the authors really not understand how the internet works?

It looks to me like they're saying "don't bother letting your users download zip files. Save your bandwidth! Just get them to send themselves a zip file, client-side!"

nacs · on April 20, 2014

I'm guessing they mean some kind of situation where the resources you're zipping up have already been downloaded to the user's browser (like a "download as Zip" link for a set of images that are being displayed on the page).

In reality, I don't think there is a good use-case for this as any files that need to be zipped are likely too large for a user's browser to process with JS (it could bog down or crash the browser).

I have a feeling they already realize this though as the example zip file is 237 bytes. If an HTTP header is larger than your zipped file, it probably doesn't need to be zipped/gzipped in the first place.

rkuykendall-com · on April 20, 2014

Well, if you have a JS app, then it's likely the content the user wants to generate is already in their browser. So you could generate a zip locally and "download" it locally, without any network traffic at all.

moron4hire · on April 20, 2014

It would be great for my current project. The project is entirely client-side right now, there is 0 server-side component (though certain features in the near future will require a server-side aspect). For what my project is doing, and for the various interpretations of what ZIP can be, having client-side ZIP generation is great.

Just think of all of the container formats that use ZIP with different file extensions. JAR and ePub to be two off the top of my head; I am sure there are plenty more.

The importance is not in compression, it is in container formats.

stu_k · on April 20, 2014

Hi, I'm the author of JSZip. This sentence is kind of awkward. What I meant is that a lot of data is generated in the browser, and traditionally (at least at the time I wrote this library almost 5 years ago) this would involve sending the data back to your server to zip it up and send it back to the user for download. Instead this can all be done client side.

As an example, someone told me about a web app[1] which allows you to create an animated sprite offline. They use JSZip to let you download all the frames in one file.

There's actually an outstanding PR[2] to update the documentation that I still need to review, hopefully that makes things clearer.

[1] http://www.piskelapp.com/ [2] https://github.com/Stuk/jszip/pull/114

jordanscales · on April 20, 2014

Consider something like bootstrap, which allows you to customize your download before sending you a zip file.

The assets are already loaded (since you're viewing the bootstrap demo page), so instead of making a request to another server to generate some sort of compressed file for you, that labor is offloaded to the client.

stu_k · on April 20, 2014

In fact the bootstrap customizer does indeed use JSZip to create the zipped download! See http://getbootstrap.com/assets/js/customize.min.js

nacs · on April 20, 2014

Except you're unnecessarily bogging down the user's browser to a far greater extent (base64 encoding/decoding everything on a possibly underpowered CPU/IO) than the amount of work it would take to do it on the server (pure binary processing on a high end CPU).

I'm guessing Bootstrap can do it because they know most Bootstrap users are developers with decent PCs but for a more mainstream audience, it would be problematic.

kalleboo · on April 20, 2014

Have you used Mega.com? It does in-browser (JavaScript) encryption of uploads and downloads. And I can still hit 5 MB/s throughput. JavaScript can handle Zip compression easily.

For most users, bandwidth is in shorter supply than CPU. Especially on mobile (your constrained CPU/IO case), where people are using 3G or worse, which is often even billed by the MB.

TD-Linux · on April 20, 2014

Except that most clients are ridiculously overpowered compared to a heavily loaded server. Even the slowest 1.3ghz core 2 duo is better than a high end xeon if there are 10 users at one time.

TheZenPsycho · on April 20, 2014

Once again, a hacker news commenter's idea of what in browser javascript performance is like is ridiculously 15 years out of date.

It's 2014 now. Your iphone is 10 times faster than your 1998 pentium 2. Even with the JS vm penalty.

Unless you're zipping 70mb files, there's no chance you're overwhelming anyone's browser.

The usecase for this is the same you'd have for any desktop application: A handy "binary file format" library. Data portability wins.

lmz · on April 20, 2014

It seems that the data to be zipped is also on the client side (e.g. a color scheme designer tool that wants to bundle some css files and a background image). If the alternative is sending the uncompressed data back to the server and having the server bundle them up into a zip, this approach saves bandwidth.

clxl · on April 20, 2014

Atwood's law in action

collyw · on April 20, 2014

Is this really news? Almost every server side language out there will have a zip library, and I doubt any of those got mentioned on HN.

kalleboo · on April 20, 2014

Server-side languages, yes. There are lots of those, so each one doesn't impact many. But there's only one client-side language on the web, so it impacts everyone.

collyw · on April 21, 2014

So we are reporting it just because it is JavaScript.

avmich · on April 22, 2014

Yes. JavaScript is lingua franca of the Web, so it may be important to have a particular functionality implemented in JavaScript.

This way, if you, say, want to take a part of the program and use it elsewhere, you have the advantage of the same language. If you want to ask a question about the program, you can count on other people knowing the language. If you want to illustrate something using a program, you have a common ground here.