Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This also makes sense from data compression theory. If the digits of an irrational / transcendental number share some of the properties of a random string, then you shouldn't be able to compress it. And finding a fractional representation with fewer total digits is a form of data compression.


Nitpick: data compression theory says you can't have a general-purpose algorithm that on average compresses random strings. The best any algorithm can do is make some strings shorter and some strings longer, which is why compression is only useful on strings with known properties. But given a particular random finite string (such as N digits of pi) you can very likely (certainly?) find an algorithm that compresses it -- which is why people are able to offer a number of compression algorithms for approximations of pi in the post.


Here's a mildly amusing challenge and response to that challenge:

(http://www.patrickcraig.co.uk/other/compression.htm)

This page has some nice compression gimmicks (a file that compresses well with one algorithm but hardly at all with another; a file that uncompresses to itself)

(http://www.maximumcompression.com/compression_fun.php)

The large text compression benchmark has some nice finely tuned compression software and statistics.

(http://mattmahoney.net/dc/text.html)

And the Hutter prize is interesting. (Get the 100 Mb file of enwiki8 and the de-compressor to under 16 Mb)

(http://prize.hutter1.net/)


A simple way to look at it is kind of like the pigeon hole principle. If you imagine a binary string of length N. Then there are 2^N possible strings. To be losslessly compressed a string must be mapped uniquely to a string of length of at most 2^N - 1. So trivially, there are not enough strings to losslessly compress a binary string of length N to.

But it is acceptable to talk about compression in terms of Kolmogorov Complexity - roughly, if the shortest program which outputs a particular string is shorter than the length of the string then we have compression. Of course one can also show that KC does not compress most strings by much.

But KC is more interesting than say an entropy coding algorithm for infinite (or big finite) strings which possess a lot of structure, PI say (which by the way is by definition not a random string). The expressed program will be far smaller by far, yielding an impressive compression of the sequence.


> mapped uniquely to a string of length of at most 2^N - 1.

I think you mixed up lengths and number of values here. With 2^N - 1 it is the latter.

> Of course one can also show that KC does not compress most strings by much.

The problem of finding the Kolmogorov complexity of a string is undecidable, so I wonder if this statement is true.


I dunno, I think sum_{i=0}^{\infty} (-1)^i 4/(2i+1) is a pretty good compression for an infinite string.


Is the symbol for pi considered "compression"? Why or why not?


You have to include an algorithm for decompressing it into enough information to answer a relevant question (in this post's case) "what is the nearest multiple of 10^-n, for some given n?"

Otherwise, it is a (very useful) abstraction. It isn't compression, it is a technique for avoiding expansion in cases where an expanded form is never needed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: