Python Book


The Stick of Jan Sloot

There is a story (I can't remember who told this story first, but it was probably Martin Gardner) about an alien that lands on Earth. The alien wishes to return home with the knowledge of all books on Earth. Unfortunately, his spaceship is so small that he can only take with him a small stick. So what he does is the following. First, he digitizes each book. This translates each book in one huge, but finite, number. Then he attaches all these numbers to each other, which results in an enormous, but still finite, number. In front of this number he places "0.", turning it into a fraction between zero and one. Setting the length of the stick to one, he makes a small cut in the stick at precisely the fraction. When he returns home, he only needs to measure the place of the cut, to get back the fraction, and thus the number, and thus the codes for all the books.

While it seems genius at first glance, of course this kind of information compression is impossible. However, the only reason that it is impossible, is that the universe consists of a finite number of indivisible particles. Were that not the case, in theory the alien's system could work. Of course, it would be unlikely that it could be implemented in practice, since it needs measuring on a scale so fine that even God would not be able to tell the difference between the stored information and the rubbish next to it.

I was reminded of this story when I encountered the affair of Jan Sloot and his Sloot Digital Coding System (SDCS). Since this affair is mainly known in the Netherlands, I will discuss it here.

Translated from UT News:

In 1999, Jan Sloot, a common television repair man, discovered a revolutionary coding system that would make hard disks, CD-ROMs, and other data stores superfluous. All movies ever made would fit on one CD-ROM. Sloot attempted to sell his invention to Philips, but he found that Philips' engineers were not interested.

Roel Pieper, at that time a member of the board of Philips, thought the invention was valuable, and decided to join the inventor and his companions. Pieper and Sloot looked for investors all around the world. Pieper himself invested millions in the project. All investors of The Fifth Force, Sloot's company, hoped to become billionaires.

But the paranoid inventor died on September 11, 1999, one day before the invention would be specified in detail in a contract. All his notes, his prototype, and his source code were lost forever.

Translated from GIDS:

The Sloot Digital Coding System (SDCS) would shake the world: a new alphabet for digital storage that didn't use binary code, but a much more efficient method. The principle behind SDCS seems simple. As a text consists of a limited number of characters, a movie consists of a limited amount of colours and sounds. All those basic data were stored in five algorithms in five memory stores. For movies, each algorithm would have a maximum length of 74 Mb. That's 370 Mb in total: the invention's engine. To start the engine, only a proper key was needed. For every page of a book, for every image in a movie, Sloot calculated a unique code. The concatenation of these codes would again result in a unique code. The final code, the key, would be one kilobyte in length, regardless of the length of the movie or the size of the book.

Roel Pieper, professor in Computer Science, commented on people questioning the possibility of such a high compression (translated from UT News):

"It is not about compression. Everyone is mistaken about that. The principle can be compared with a concept as Adobe-postscript, where sender and receiver know what kind of data recipes can be transferred, without the data itself actually being sent."

So we're back to the alien and his stick. Basically, Roel Pieper explains Sloot's 'invention' as the movies already residing on the computers of both parties, but in a deconstructed form, and a key being transferred that specifies exactly how reconstruction must take place.

In principle, this is possible. But the crux is, how long the key needs to be, and how big the data store of deconstructed forms must be. Sloot claims that a key of one kilobyte, and a data store of less than 400 Mb, would suffice. This sounds unbelievable, and it is, in fact, mathematically impossible. There are many different arguments. I prefer the simplest one:

A key of a limited length (whether it is a kilobyte or a terabyte) can only store a limited number of codes, and therefore can only distinguish a finite number of movies. However, the actual number of possible movies is infinite. For, suppose it were finite; in that case there would be a movie that is the longest. By just adding one extra image to the movie, I would have created a longer movie, which I didn't have before. Ergo, the number of possible movies is infinite. Ergo, any key of limited length cannot distinguish every possible movie.

The SDCS is only possible if keys are allowed to become infinite, or the data store is allowed to become infinite (if the data store already contains all movies ever made, a key consisting of a number can be used to select the movie you want to see -- however, in that case it is impossible to have keys for movies that have not been made yet at the time the data store was constructed). This would, of course, make the idea useless.

The whole problem of Pieper's way of thinking (and of every other person who believes that Sloot actually could compress movies with a factor of two million) is that he believes that the key does not need to contain every detail of a movie. Unfortunately, it does.

In the SDCS, it is claimed that no movies are stored, only basic building blocks of movies, such as colours and sounds. So, when a number is presented to the SDCS, it uses the number to fetch colours and sounds, and constructs a movie out of them. Any movie. No two different movies can have the same number, otherwise they would be the same movie. Every possible movie gets its own unique number. Therefore, I should be able to generate any possible movie by loading some unique number in the SDCS.

Think of it: by placing the right number in the SDCS, I can not only get Orson Welles' Citizen Kane. I can get Citizen Kane in colour! Or Citizen Kane backwards! Or Citizen Kane where the credits misspell the name of Everett Sloane, who plays mr. Bernstein, as Everett Sloot! Or Citizen Kane in Swahili! Or Citizen Kane where 'Rosebud' is a teddy bear! Or Citizen Kane with a cameo of myself! Or the cartoon version of Citizen Kane! Or Citizen Kane where Charles Foster Kane wins the election! Or the gay version of Citizen Kane! Or Citizen Kane where Charles Foster Kane is replaced by Jar Jar Binks!

How many movies are possible that are variations on Citizen Kane? More than fit in a number of one kilobyte, I can tell you. More than there are atoms in the universe. An infinite number, actually.

In the SDCS, the details of every possible movie have to be somewhere. They are either in the number, or in the data store. Since the data store is the same for each movie, the details can't be in the data store. Therefore, everything that differs between movies, must be represented in some way in the numbers. Which, for infinite variations of movies, is impossible with finite numbers.

Would it be possible to store, say, all possible movies up to three hours in length, with a limited resolution and a limited sound quality, in a series of finite numbers? Yes, of course that is possible, since in the digital world, there is no difference between data and numbers, thus data of a limited length is equal to a finite number.

Even though Sloot claimed that one kilobyte was enough to store any movie regardless of its length, the question is warranted whether a system as defined by the SDCS is possible for movies of a limited (but realistic) length. The answer is no. One kilobyte is 1024 bytes, and a byte can take 256 different values. With a code of one kilobyte, to represent any movie up to 90 minutes in length, on average a byte must be able to distinguish between all possible sequences of about 5 seconds. Can a sequence of 5 seconds of any movie at all be identified with a number between 0 and 255? Of course not.

The concept of the SDCS, as described, is equivalent to a universal Turing machine. A universal Turing machine (which is actually quite easy to simulate in a computer), can emulate any possible computer program (including a program that shows Citizen Kane which ends with a pie fight) just by feeding it a number. The problem is that even for trivial programs the numbers quickly get enormous, so that it is impossible to filter the rare useful programs from the universe of chaos consisting of all numbers. Therefore, it isn't viable business to sell people a universal Turing machine as a solution to all their business problems.

Evidently, Jan Sloot was a crank. He showed all the signs attributed to cranks. He was paranoid. He felt himself an expert in a field he had no higher education in. He worked alone. He spent decades on one problem any mathematician could have told him was impossible to solve in the first place. He got angry at people who questioned him. He felt misunderstood. He had delusions of grandeur.

I think Jan Sloot really believed he had found a way to compress movies to one kilobyte. Probably he had imagined something as the alien's stick, something that would be theoretically possible in an alternate world, with infinitely divisible particles, but impossible in our reality. He probably lacked the mathematical insight to see that his 'invention' was a farce. In his prototype, he faked his invention, which is why he refused to let anyone near it, and answered only in mystical vagueness to questions. He probably imagined that, with enough money, he could get his invention to work for real. He probably did not feel himself a charlatan or a crook, but an honest businessman, who needed a starting capital to create the world's greatest invention, make some people incredibly rich, and win a Nobel prize.

What truly amazes me is that a man as Roel Pieper, who is a professor of Computer Science no less, could fall for his story, to the point where he invested a huge amount of capital. If his role in this story is really as reported in the media, his credibility as a computer scientist has been seriously tarnished. In my opinion, the University of Twente, with which Pieper is associated, should at least perform an internal investigation, to assess whether Pieper's position can be maintained.

Incidentally, if anyone is interested, I have devised an invention where I can store any movie ever created on a small stick. I need some money to develop this concept further. If you have a couple of million to spare, give them to me. I'll be glad to build a prototype. We're gonna be rich and famous!

September 28, 2004

© 2004 by Pieter Spronck