15 April 2005

Entropy and compression

A computer scientist might observe that entropy is information content. To describe the insides of a hot air balloon exhaustively (setting aside the Uncertainty Principle for a moment) would require notes about the position and velocity of every air molecule. The crystalline structure of ice would save us a lot of writing, as the possible variation in position and velocity of the individual molecules is much less. Pound for pound, steam contains more information than ice. And so it is with digital information. A GIF or PNG file of a highly chaotic image is much larger than an image that consists of just a few colors and shapes.

Yet somehow it doesn't seem that way to us humans. We have one-syllable words for both ice and steam. Steam doesn't seem so much more complex.

This is because we generalize. We throw away low-level variations in favor of high-level order.

In a high-entropy situation, a lot of the details are, almost by definition, unobservable at the macro level. This means we probably don't care much about them. Lossy compression is the computer science equivalent. If we wish to compress a large amount of chaotic data to a manageable size, we can discard information about details we consider irrelevant. This is what lossy compression schemes, like JPEG, attempt to do.

(I'm not very sure of the technical details here. Still reading about it.)

No comments: