Read In Pursuit of the Unknown Online
Authors: Ian Stewart
It is the equation that ushered in the information age. It established limits on the efficiency of communications, allowing engineers to stop looking for codes that were too effective to exist. It is basic to today's digital communications â phones, CDs, DVDs, the internet.
Efficient error-detecting and error-correcting codes, used in everything from CDs to space probes. Applications include statistics, artificial intelligence, cryptography, and extracting meaning from DNA sequences.
I
n 1977 NASA launched two space probes,
Voyager 1
and
2
. The planets of the Solar System had arranged themselves in unusually favourable positions, making it possible to find reasonably efficient orbits that would let the probes visit several planets. The initial aim was to examine Jupiter and Saturn, but if the probes held out, their trajectories would take them on past Uranus and Neptune.
Voyager 1
could have gone to Pluto (at that time considered a planet, and equally interesting â indeed totally unchanged â now that it's not) but an alternative, Saturn's intriguing moon Titan, took precedence. Both probes were spectacularly successful, and
Voyager 1
is now the most distant human-made object from Earth, more than 10 billion miles away and still sending back data.
Signal strength falls off with the square of the distance, so the signal received on Earth is 10â
20
times the strength that it would be if received from a distance of one mile. That is, one hundred quintillion times weaker.
Voyager 1
must have a really powerful transmitter . . . No, it's a tiny space probe. It is powered by a radioactive isotope, plutonium-238, but even so the total power available is now about one eighth that of a typical electric kettle. There are two reasons why we can still obtain useful information from the probe: powerful receivers on Earth, and special codes used to protect the data from errors caused by extraneous factors such as interference.
Voyager 1
can send data using two different systems. One, the low-rate channel, can send 40 binary digits, 0s or 1s, every second, but it does not allow coding to deal with potential errors. The other, the high-rate channel, can transmit up to 120,000 binary digits every second, and these are encoded so that errors can be spotted and put right provided they're not too frequent. The price paid for this ability is that the messages are twice as long as they would otherwise be, so they carry only half as much data as they could. Since errors could ruin the data, this is a price worth paying.
Codes of this kind are widely used in all modern communications: space missions, landline phones, mobile phones, the Internet, CDs and DVDs, Blu-ray, and so on. Without them, all communications would be
liable to errors; this would not be acceptable if, for instance, you were using the Internet to pay a bill. If your instruction to pay £20 was received as £200, you wouldn't be pleased. A CD player uses a tiny lens, which focuses a laser beam on to very thin tracks impressed in the material of the disc. The lens hovers a very tiny distance above the spinning disc. Yet you can listen to a CD while driving along a bumpy road, because the signal is encoded in a way that allows errors to be found and put right by the electronics while the disc is being played. There are other tricks, too, but this one is fundamental.
Our information age relies on digital signals â long strings of 0s and 1s, pulses and non-pulses of electricity or radio. The equipment that sends, receives, and stores the signals relies on very small, very precise electronic circuits on tiny slivers of silicon â âchips'. But for all the cleverness of the circuit design and manufacture, none of it would work without error-detecting and error-correcting codes. And it was in this context that the term âinformation' ceased to be an informal word for âknow-how', and became a measurable numerical quantity. And that provided fundamental limitations on the efficiency with which codes can modify messages to protect them against errors. Knowing these limitations saved engineers lots of wasted time, trying to invent codes that would be so efficient they'd be impossible. It provided the basis for today's information culture.
I'm old enough to remember when the only way to telephone someone in
another country
(shock horror) was to make a booking ahead of time with the phone company â in the UK there was only one, Post Office Telephones â for a specific time and length. Say ten minutes at 3.45 p.m. on 11 January. And it cost a fortune. A few weeks ago a friend and I did an hour-long interview for a science fiction convention in Australia, from the United Kingdom, using Skypeâ¢. It was free, and it sent video as well as sound. A lot has changed in fifty years. Nowadays, we exchange information online with friends, both real ones and the phonies that large numbers of people collect like butterflies using social networking sites. We no longer buy music CDs or movie DVDs: we buy the information that they contain, downloaded over the Internet. Books are heading the same way. Market research companies amass huge quantities of information about our purchasing habits and try to use it to influence what we buy. Even in medicine, there is a growing emphasis on the information that is contained in our DNA. Often the attitude seems to be that if you have the information required to do something, then that alone suffices; you don't need actually to do it, or even know how to do it.
There is little doubt that the information revolution has transformed
our lives, and a good case can be made that in broad terms the benefits outweigh the disadvantages â even though the latter include loss of privacy, potential fraudulent access to our bank accounts from anywhere in the word at the click of a mouse, and computer viruses that can disable a bank or a nuclear power station.
What is information? Why does it have such power? And is it really what it is claimed to be?
The concept of information as a measurable quantity emerged from the research laboratories of the Bell Telephone Company, the main provider of telephone services in the United States from 1877 to its break-up in 1984 on anti-trust (monopoly) grounds. Among its engineers was Claude Shannon, a distant cousin of the famous inventor Edison. Shannon's best subject at school was mathematics, and he had an aptitude for building mechanical devices. By the time he was working for Bell Labs he was a mathematician and cryptographer, as well as an electronic engineer. He was one of the first to apply mathematical logic â so-called Boolean algebra â to computer circuits. He used this technique to simplify the design of switching circuits used by the telephone system, and then extended it to other problems in circuit design.
During World War II he worked on secret codes and communications, and developed some fundamental ideas that were reported in a classified memorandum for Bell in 1945 under the title âA mathematical theory of cryptography'. In 1948 he published some of his work in the open literature, and the 1945 article, declassified, was published soon after. With additional material by Warren Weaver, it appeared in 1949 as
The Mathematical Theory of Communication
.
Shannon wanted to know how to transmit messages effectively when the transmission channel was subject to random errors, ânoise' in the engineering jargon. All practical communications suffer from noise, be it from faulty equipment, cosmic rays, or unavoidable variability in circuit components. One solution is to reduce the noise by building better equipment, if possible. An alternative is to encode the signals using mathematical procedures that can detect errors, and even put them right.
The simplest error-detecting code is to send the same message twice. If you receive
the same massage twice
the same message twice
then there is clearly an error in the third word, but without understanding English it is not obvious which version is correct. A third repetition would decide the matter by majority vote and become an error-correcting code. How effective or accurate such codes are depends on the likelihood, and nature, of the errors. If the communication channel is very noisy, for instance, then all three versions of the message might be so badly garbled that it would be impossible to reconstruct it.
In practice mere repetition is too simple-minded: there are more efficient ways to encode messages to reveal or correct errors. Shannon's starting point was to pinpoint the meaning of efficiency. All such codes replace the original message by a longer one. The two codes above double or treble the length. Longer messages take more time to send, cost more, occupy more memory, and clog the communication channel. So the efficiency, for a given rate of error detection or correction, can be quantified as the ratio of the length of the coded message to that of the original.
The main issue, for Shannon, was to determine the inherent limitations of such codes. Suppose an engineer had devised a new code. Was there some way to decide whether it was about as good as they get, or might some improvement be possible? Shannon began by quantifying how much information a message contains. By so doing, he turned âinformation' from a vague metaphor into a scientific concept.
There are two distinct ways to represent a number. It can be defined by a sequence of symbols, for example its decimal digits, or it can correspond to some physical quantity, such as the length of a stick or the voltage in a wire. Representations of the first kind are digital, the second are analogue. In the 1930s, scientific and engineering calculations were often performed using analogue computers, because at the time these were easier to design and build. Simple electronic circuits can add or multiply two voltages, for example. However, machines of this type lacked precision, and digital computers began to appear. It very quickly became clear that the most convenient representation of numbers was not decimal, base 10, but binary, base 2. In decimal notation there are ten symbols for digits, 0â9, and every digit multiplies ten times in value for every step it moves to the left. So 157 represents
1 Ã 10
2
+ 5 Ã 10
1
+ 7 Ã 10
0
Binary notation employs the same basic principle, but now there are only two digits, 0 and 1. A binary number such as 10011101 encodes, in symbolic form, the number
1 Ã 2
7
+ 0 Ã 2
6
+ 0 Ã 2
5
+ 1 Ã 2
4
+ 1 Ã 2
3
+ 1 Ã 2
2
+ 0 Ã 2
1
+ 1 Ã 2
0
so that each digit doubles in value for every step it moves to the left. In decimal, this number equals 157 â so we have written the same number in two different forms, using two different types of notation.
Binary notation is ideal for electronic systems because it is much easier to distinguish between two possible values of a current, or a voltage, or a magnetic field, than it is to distinguish between more than two. In crude terms, 0 can mean âno electric current' and 1 can mean âsome electric current', 0 can mean âno magnetic field' and 1 can mean âsome magnetic field', and so on. In practice engineers set a threshold value, and then 0 means âbelow threshold' and 1 means âabove threshold'. By keeping the actual values used for 0 and 1 far enough apart, and setting the threshold in between, there is very little danger of confusing 0 with 1. So devices based on binary notation are robust. That's what makes them digital.
With early computers, the engineers had to struggle to keep the circuit variables within reasonable bounds, and binary made their lives much easier. Modern circuits on silicon chips are precise enough to permit other choices, such as base 3, but the design of digital computers has been based on binary notation for so long now that it generally makes sense to stick to binary, even if alternatives would work. Modern circuits are also very small and very quick. Without some such technological breakthrough in circuit manufacture, the world would have a few thousand computers, rather than billions. Thomas J. Watson, who founded IBM, once said that he didn't think there would be a market for more than about five computers worldwide. At the time he seemed to be talking sense, because in those days the most powerful computers were about the size of a house, consumed as much electricity as a small village, and cost tens of millions of dollars. Only big government organisations, such as the United States Army, could afford one, or make enough use of it. Today a basic, out-of-date mobile phone contains more computing power than anything that was available when Watson made his remark.
The choice of binary representation for digital computers, hence also for digital messages transmitted between computers â and later between almost any two electronic gadgets on the planet â led to the basic unit of
information: the
bit
. The name is short for âbinary digit', and one bit of information is one 0 or one 1. It is reasonable to define the information âcontained in' a sequence of binary digits to be the total number of digits in the sequence. So the 8-digit sequence 10011101 contains 8 bits of information.
Shannon realised that simple-minded bit-counting makes sense as a measure of information only if 0s and 1s are like heads and tails with a fair coin, that is, are equally likely to occur. Suppose we know that in some specific circumstances 0 occurs nine times out of ten, and 1 only once. As we read along the string of digits, we expect most digits to be 0. If that expectation is confirmed, we haven't received much information, because this is what we expected anyway. However, if we see 1, that conveys a lot
more
information, because we didn't expect that at all.