Tuesday, May 15, 2001
Farhad Sadaghiani (g9sada@cdf.utoronto.ca)
Image compression can be loosely defined as the process of minimizing the size in bytes of an image file without degrading the quality of the image to an “unacceptable” level. The reduction in file size allows for more images to be stored in a given amount of disk or memory space. It also reduces the time required for images to be sent over the Internet or downloaded from World Wide Web pages.
There are several different ways in which image files can be compressed. For Internet use, JPEG is the most common compressed graphic image format. JPEG is "lossy," meaning that the compressed image is not quite the same as the original image. In fact, JPEG is designed to exploit known limitations of the human eye, notably the fact that small colour changes are perceived less accurately than small changes in brightness [5].
Recently, Michael Barnsley introduced an intriguing new type of image compression based on fractal techniques. Barnsely suggested that by using repeated iterations of affine transformations of the plane, one could reproduce a fractal-like image by storing the image as a collection of transformations rather than a collection of pixels. For instance, the Barnsely fern is complicated and intricate but it can be generated from only four affine transformations. Since representing each transformation requires only six numbers, the fern can be stored as 4 ´ 6 numbers then generated (by iterating through the transformations) whenever needed rather than stored in memory or on disk as a collection of pixels. While this technique is sound and practical, not all images are self-similar or fractal-like. The crux of fractal image compression is finding a means of obtaining an iterated function system (IFS) for an arbitrary image – this is known as the inverse problem and in the general form remains unsolved.
This paper discusses the theory behind fractal image compression with specific attention to partitioned iterated function systems, root mean square and different techniques of partitioning an image into self-similar subsets.
An image of a baby’s face (such as the one included with the accompanying programme) does not contain the self-similarity that Barnsely described; that is, the image does not seem to contain affine transformations of itself. This is because the fractals that one can easily generate with an IFS are all of a particular type. That is, they are images that can be seen as collages of deformed and intensity transformed copies of themselves. Thus, in an IFS encoding of the baby’s face one should see tiny little distorted copies of the face everywhere. This is not only unnatural but also technically infeasible.
One of Barnsley’s graduate students, Arnaud Jacquin, realised a fractal encoding system that left behind the rigid thinking in terms of global IFS mappings [2]. Basically, Jacquin’s new approach was based on finding local self-similarities. That is, an image should not be thought of as a collage of copies of the entire image, but of copies of smaller parts of it. Finding such self-similarity is accomplished by partitioning the original image at different scales. Since images usually take the form of a rectangular array of pixels, partitioning the original image into blocks is a natural choice. Using Jacquin’s terminology, the large partitions are called domain blocks and the small partitions are range blocks [3]. Both domain and range blocks are derived from the same image. Range blocks evenly partition the image so that every pixel is included. The larger domain blocks may overlap, and need not contain every pixel in the image. The goal of the compression process is to find a closely matching domain block for every range block. The set of domain blocks considered in this operation is called the domain pool. Consider, for instance, that a part of a cloud certainly does not look like an entire landscape with clouds, but it does not seem so unlikely to find another section of some cloud or some other structure in the image that looks like the given cloud section.
Jacquin’s new view on encoding an image is known as a partitioned iterated function system (PIFS). With this new basic approach, many new questions arose like how should the image be partitioned? Where should one search for matching image portions? How should the intensity transformation be designed? And, most importantly, how can this all be done efficiently and quickly?
In order to discuss image compression, we must represent an image as a mathematical function, z = f(x, y). The graph of this function is generated by taking an image and plotting the grey level of each pixel at position (x, y) as a height (with white being high and black low). Thus, when referring to an image, we refer to a function f(x, y) that gives the grey level at each point (x, y). Also, we assume that we are dealing with square images of size 1; that is (x, y) Î{ (u, v) | 0 £ u, v £ 1 }º I2 and f(x, y) Î I º [0, 1] [3]. When evaluating the encoding of an image, two closely related quantities must be observed. The first is the compression ratio. It is defined as the ratio of the file size of the original image representation to that of the encoded version (the gallery of before and after images lists some of the compression ratios obtained using the quadtree encoder). The other quantity measures quality (or accuracy). The root mean square (RMS) metric is defined as:
and is most favorable for this purpose since it is convenient to implement in applications [3].
We now extend the traditional form of affine transformations of the plane to include an extra dimension that represents the grey levels of an image, z:
Here, si controls the contrast and oi controls the brightness of the transformation.
The process of encoding images requires us to find a collection of contractive maps w1, w2, …, wn with W = and f as the fixed point (or attractor) of the map W. The fixed-point equation suggests that we partition f into pieces to which we apply the transforms wi to get back the original image f [3]. Of course, finding an exact set of transforms is implausible since arbitrary images are not self-similar; instead, we find another image g with drms(f, g) less than or equal to some specified tolerance value. That is, we find domains (Di) and maps wi so that when we apply a map wi to the part of the image over Di we get something that is very close to the part of the image over ranges (Ri). The crux, then, of an encoding algorithm is finding contractive maps wi that minimise the RMS error, drms(f, g).
The decoding process is much simpler and (starting with an initial image f0 – usually a uniform grey or white image) can be achieved by iterating through the collection of maps. On the first iteration, , and on the second iteration, , etc. This process can be repeated until the attractor resembles the original image (in practice, when using the quadtree encoder on greyscale images, eight to ten iterations usually provides an acceptable result).
The heart of a fractal compression algorithm lies in its partitioning scheme. Once an image is adequately partitioned we can build the transformation that encodes the approximation to the original image.
A quadtree is a representation of an image as a tree such that each node of the tree, which corresponds to a square portion of the image, contains four sub-nodes that corresponds to the quartered squares of the node in question. The root of the tree is the entire image.
Given a specified RMS tolerance value, the quadtree partitioning algorithm works by setting range R1 = I2 (the entire image) and marking it uncovered. Then, while there are ranges Ri that remain uncovered, out of the possible domains D, the domain Di and the corresponding transformation wi that best cover Ri are found. If the specified tolerance is met or the size of Ri is less than or equal to the minimum allowable range size then Ri is marked as covered and the transformation wi is written out. Otherwise, Ri is partitioned into smaller ranges that are marked as uncovered and Ri is removed from the list of uncovered ranges. The collection of all maps represents the quadtree encoding of the image. Recall that the transformations wi consist of multiplication by a scaling factor si and an addition of an offset value as oi described above [3].
When decoding an image, the quadtree partition is used to determine all the ranges in the image. For each range Ri, the domain Di that maps to it is shrunk by two in each dimension by averaging non-overlapping groups of 2 x 2 pixels. The shrunken domain pixel values are then scaled by si, and offset by oi, and placed in the location in the range determined by the orientation information. This represents one decoding iteration. [3]
A visual artifact left by the quadtree algorithm after an image is decoded is the appearance of non-smooth transitions on block boundaries from one shade to another. These artifacts can be minimised by post-processing the image using an anti-aliasing like scheme. That is, we modify the pixels at the boundary of the range blocks using a weighted average of their values.
Many other partitioning schemes exist and new ones continue to be developed. In fact, most of the research in the field of fractal image compression lies in optimizing the encoding scheme, specifically the partition and search (and classification) algorithms [6]. Other well-known partitioning schemes include HV partitioning whereby a rectangular image is recursively partitioned either horizontally or vertically to form two new rectangles. The partition repeats recursively until a covering tolerance is satisfied (as in the quadtree scheme). The HV partition scheme remedies a weakness in quadtree encoding; the quadtree algorithm makes no attempt to select the domain pool D in a content-dependant way while HV partitioning does [3].
Fractal image compression is a promising and exciting technology. While early claims of very high compression ratios on the order of hundreds and thousands to one are now known to be impractical or impossible, the technology can be used to provide very efficient storage of large images. In practice, however, there are two major factors holding back its progression; specifically, lengthy encoding times and the current existence of several widely accepted image compression standards like JPEG. Arguably one of the biggest and most wide spread applications of image compression is for the display of images on the Internet. High resolution images (or so called rich media) require high bandwidth connections in order to be effectively received by users. With the advent of broadband technology, however, the need for smaller image files will soon be overshadowed by the need for compatible, free and standard image formats.
One of the good things of standards is that people can build further research and applications on them, thereby accelerating scientific progress. Patents have been granted on fractal image compression technology and more are expected [4]. This, unfortunately, will likely make any adoption of the technology much harder.
The increasing interest in fractal image compression has led to the creation of many web sites dedicated to this field. Two of the most popular and important web sites on the subject are:
[1] Vrscay, Edward R. “A Hitchhiker’s Guide to ‘Fractal-Based’ Function
Approximation and Image Compression”. Available at:
http://links.uwaterloo.ca/hitchiker.html.
[2] Kominek, John. “Introduction to Fractal compression”. Available at:
http://www.landfield.com/faqs/compression-faq/part2/section-8.html.
[3] Fisher, Yuval. “Fractal Image Compression: Theory and Application”.
Springer-Verlag, ISBN: 0-387-94211-4
[4] Kominek, John. “Advances in Fractal Compression for Multimedia Applications”
[5] JPEG image compression FAQ
http://www.faqs.org/faqs/jpeg-faq/part1/preamble.html
[6] Saupe, Dietmar, et al. “Fractal Image Compression An Introductory Overview”