Remember that photo-enhancing scene in Blade Runner? Are you feeling motivated yet? For me the decompression speed and small file size were nice things when I first encourntered the format back in ~1992 but jpegs were already good in that area and it was obvious that the price of storage space and bandwidth would fall (though I continue to be surprised by the progress of technology). So to critics who say it's not competitive on size or speed - quite right, no argument from me.
However, nothing I have seen in the intervening 22 years comes close to FI compression for image enhancement. It's hard to find examples online but trust me, it was this good on everything we threw at it.
There's something really fishy about the "balloon" example in the paper. Allegedly it's a 512x512 image, but if you look at it it's plainly 64x64. (If I view the article using Firefox's pdf.js, then it's upscaled in some kinda-smooth way; if I open it in a dedicated PDF viewer then the big pixels are clearly visible.) And then the fractally-compressed-and-decompressed images have miraculously created detail that wasn't there before, and have smoothed out the blocky edges of the balloon (for instance) into nice high-resolution smooth curves.
If it really did that, then it would be badly wrong because for all the compressor knows it's destroying lots of useful information by rounding off all the corners. And it would be miraculous because what it's actually done turns out to make something physically plausible, when there are any number of ways of filling in the missing detail that fit the data in the 64x64 image equally well, as far as anything short of full AI could tell.
But I don't really believe it did really do that, because (1) from Barnsley's description of the algorithm it seems fantastically unlikely that it could, (2) the paper says explicitly that the original and reconstructed images are both 512x512 pixels, and (3) if fractal compression were really doing something so miraculous then Barnsley's article would have made a lot more noise about it (and in particular the discussion of "fractal zoom" that follows wouldn't make any sense if Barnsley thought he had just showed a spectacular 8x fractal zoom already).
I'm guessing that something very bad happened to Figure 2, and that the image that was actually compressed and decompressed to produce Figure 3 looked a lot more like Figure 3 than it did like Figure 2 as shown in the paper.
I, too, was at Iterated Systems in its early days.
See the correction after the comment below - Good reason to be confused. Fig 2 is actually the first step in decoding.
You are missing the significance of the 64x64. The point was that the picture was split into squares that were bigger than single pixels before encoding. In this case, the regions were 8x8, so there are 64x64 of them in the original 512x512 image. If the basic elements were 1x1 pixel the linear transformation encoding would have made the encoding bigger than the original! The images illustrate some of the steps in the iterative decoding process, which always starts with single intensity blocks in the basic elements (so 64x64 blocks) but iteratively gets closer to the original, finer grained image. It looks like magic, but the fractal theory is sound. It is a lossy process, so there is no guarantee that the encoded image has both limited losses and is small in size.
I was an algorithm developer, not in sales. For sales purposes they clearly could have chosen example images that were more naturally fractal at a scale below a pixel, so blowing them up looked good. I do not know how generally good the zooming in was.
The image is captioned as "Original 512 x 512 grayscale image, with 256 gray levels for each pixel, before fractal compression.".
So, if what your saying is true the caption is wrong? Are you saying it is the first stage of decompression? That this is the input data to the decompressor?
I'm assuming therefore that the compressor actually does use a full undownsampled (not blocky) image as it's input. Or uses it as part of it's iterative compression process. Is that correct?
Good point. The caption for Figure 2 is wrong. The picture is of the first step in decompression of a 512 x 512 image. A pity you do not actually see the original that they refer to.
I think that actually the caption is "correct", and they must have simply included the wrong image, because the text says "Figure 2 shows the
original digital image of Balloon, which is of dimensions
512 pixels by 512 pixels, with 256
gray levels at each pixel."
I'm in two minds about this. That was my own first interpretation, for the same reasons as you. And you could well be right, in fact I'd say there's a 70% probability.
But the nature of iterated fractal functions is that they generate information, which is how we can get such elaborate pictures of ferns from such a small amount of starting information. When we squint at a pixelated image, we're deliberately trading off optical distortion in order to get a sense of the underlying pattern - it's not more accurate because we're inventing stuff, but it is giving us an answer to the question of 'what function would generate these shapes in a low-resolution picture?' and then going on to iterate that same function.
So while acknowledging the likelihood that you're right and I'm wrong, I'm not totally ready to come off this limb because I'm also thinking of what it was that made the resolution scaling of this technology so good - there was something more complex than mere interpolation going on. I'll have a look through the text again and think about it further.
I used the software when it first became available in the early 90s, but with a focus on resolution independence rather than reverse entropy (which is why it's more likely than not that I'm wrong).
It's certainly not a scam, Barnsley's a respected researcher in this field of mathematics - http://en.wikipedia.org/wiki/Barnsley_fern At worst this is a rendering problem in the PDF and some nostalgic over-enthusiasm on my part.
I'm very shortsighted (about -8). When I squint, my eyelashes act as a rudimentary lens and make the image clearer. The image is distorted differently when I squint, but less blurred. I suggest that those with better eyesight may also squint in order to see details on a retina display, and the squinter sees a higher res image (albeit a distorted one).
The use of fractals for image enhancement (e.g. http://www.ononesoftware.com/products/resize9/ ) seems to be orthogonal from the choice of storage or wire format. If a browser wants to retinafy a low-res image, sure, go ahead and fractalize it. Getting back to the BPG discussion, it's not clear that it's a good idea to compress images using fractal techniques.
I worked for Dr. Barnsley at Iterated Fractal Systems. It was aquired by Interwoven which was acquired by Autonomy. And finally HP which led to a huge financial scandal.
There's a photoshop plugin, Genuine Fractals, that uses fractal image compression for image resizing.
Saw this mentioned in the BPG thread (https://news.ycombinator.com/item?id=8704629) and it looked too cool not to investigate. I'd love to see some images of the type that you'd see on modern websites (say, Medium's giant page-spanning banners) compressed with this algorithm and scaled up by steps to see what artifacts result as the resolution increases.
Yeah. Some patents have expired, although Barnsley's most recent one is 2004, which appears to be a more abstract technology for the exploration of higher-order fractal manifolds (possible terminology fail there, but the pdf link below lays it out clearly):
I wondered what had happened to it. In the early 90s it seemed like it would be the future of image compression, and here we are in 2014 still using DCTs.
The real stumbling block is the difficulty in generating anything embedable, and the need to iterate on the decoder side. You can get pretty good results for general images, but not "better enough" to justify this cost over, say as wavelet threshold plus entropy coding. You also can have artifacts of the shape Of the domain support, and getting rid of these makes the encoder quite expensive ( less of a problem)
A couple of years back I saw a documentary on fractals ("Fractals: The Colors of Infinity", narrated by Arthur C. Clarke, with music by Dave Gilmour of Pink Floyd - that was one groovy documentary!) where they showed this mathematician who apparently had a research grant from some military-intelligence-ish government agency to develop fractal image compression for their spy satellites. I wondered why I never that technology in action, and I figured it had something to do with the amount of secrecy surrounding spy satellites.
It is kind of sad that the reason is so much more mundane.
I saw this back in '90, showing an aquarium, zooming into a goldfish, zooming into its eyeball, and then zooming into the Earth. I always wondered what happened to the tech. The company was IFS, IIRC.
The damning bits are this: "A fundamental weakness of fractal block coders is that the coders possess no control over the codebook. Codewords are too densely clustered around the very common all-zero subtree and too sparsely distributed elsewhere. This dense clustering of near-zerotrees increases codeword cost but contributes very little to image fidelity. ... At 0.25 b/pixel, over 80% of all coefficients in the 512 × 512 Lena image are assigned to zerotrees by our zerotree-augmented wavelet coder. Hence, only about 20% of the fractal coder’s codewords are significantly different from a zerotree. This redundancy is costly, since when using self-quantization we pay a substantial number of bits to differentiate between these essentially identical zero code words."
I haven't read that paper in ages, but Davis's points as I remember them aren't really fundamental. For example you can easily do IFS codecs in a wavelets domain and capture detail that zero trees miss, while still taking advantage of the energy concentration in that domain. This shows up in the focus on "fractal block coders" but the blocks here are not necessary.
Not that I'm arguing for IFS codecs, mind, but Davis did miss the point a bit.
I'm not sure I'm following your argument. The point is that many regions of an image simply do not have detail. This is made obvious by sparseness in a wavelet representation, but the representation is not key to the argument. A concrete example might help. If I have an image with some large, flat sky region, then for every piece of sky, an IFS needs to say which other piece of sky it resembles, even though they are all largely the same. That costs bits which do nothing to improve quality.
Davis is not the only one to make the observation (he cites two others in that paper), but he does quantify the magnitude of the problem. It's clear it requires some solution before IFS codecs have a chance of being competitive.
This is complicated by the fact that in an IFS, it's not clear which regions are smooth without doing several iterations, but the decoder needs to do all of its codebook pruning before even the first iteration. I'm not aware of a practical solution ever being demonstrated.
The point is that the concentration of energy in lower frequencies (i.e. the reason you see sparseness in the wavelet domain) is separable from the issue of using IFS. After all, using IFS means you encode an operator, not the signal.
One way to demonstrate this (which was shown around the time of Davis's paper, if I recall correctly) is to do the IFS codec in the wavelet domain hierarchically (on quadtrees in the case of an orthonormal wavelet basis). This gives you a similar advantage of spending your encoder bits (which should still be handled by an entropy coder, regardless of method) primarily get spent where the image energy is, but does not truncate information as hard as, say, a zerotree approach.
The problems Davis points out with IFS codecs done on blocks in the spatial domain are issues with block/patch operations and with operating in the spatial domain, not with IFS per se.
The fundamental idea behind IFS codecs is that you encode a transform whose fixed point approximates your signal, you don't encode the signal directly.
The real problem is the difficulty in embedding them, and the need to iterate a few times to converge the result.. which is what you allude to I think.
[note, I really should revisit Davis's paper before discussing , because I'm operating from memory here...]
I built a system around Iterated Systems Poem Colorbox SDK in the mid-90's. I was always impressed by what they were achieving but the horsepower wasn't there at the time to make the compression process very practical if you were processing large numbers of images. I still have the SDK sitting on my shelf, may have to pull that out sometime.
I remember dreaming up this idea years ago (but significantly after 1992), probably around 2000 sometime. We never tried to implement it, but we tossed around the idea of encoding images as chunks of fractal portions. There were some pretty serious deficiencies in our idea which these authors must have addressed. Neat to see that it can actually work.
Remember that photo-enhancing scene in Blade Runner? Are you feeling motivated yet? For me the decompression speed and small file size were nice things when I first encourntered the format back in ~1992 but jpegs were already good in that area and it was obvious that the price of storage space and bandwidth would fall (though I continue to be surprised by the progress of technology). So to critics who say it's not competitive on size or speed - quite right, no argument from me.
However, nothing I have seen in the intervening 22 years comes close to FI compression for image enhancement. It's hard to find examples online but trust me, it was this good on everything we threw at it.
Also from the original thread: http://www.verrando.com/pulcini/gp-ifs1.html < windows en/decoder (IFS-BMP) and some other resources, straight outta 1997.