Devlin's Angle

March 2001

Claude Shannon

Mathematician Claude Shannon died on Saturday February 22, aged 84, after a long struggle with Alzheimer's disease. But his intellectual legacy will live on as long as people communicate using phone and fax, log on to the Internet, or simply talk about "information" as a commodity that can be measured in "bits" and shipped from place to place. The approach to information and communication Shannon laid out in his groundbreaking paper "A Mathematical Theory of Communication," published in the Bell System Technical Journal in 1948, and republished virtually unchanged in the pamphlet The Mathematical Theory of Communication he wrote with Warren Weaver the following year (published by the University of Illinois Press) remain current to this day. (Note how the "a" of his paper became "the" in the Shannon-Weaver version.)

Shannon was born in Michigan in 1916. After obtaining degrees in both mathematics and engineering at the University of Michigan, he went to MIT to pursue graduate studies in mathematics. There he came into contact with some of the men who were laying much of the groundwork for the information revolution that would take off after the Second World War, notably the mathematician Norbert Wiener (who later coined the term cybernetics for some of the work in information theory that he, Shannon, and others did at MIT and elsewhere) and Vannevar Bush, the dean of engineering at MIT (whose conceptual "Memex" machine foretold the modern World Wide Web and whose subsequent achievements included the establishment of the National Science Foundation).

In the early 1930s, Bush had built a mechanical, analog computer at MIT called the Differential Analyzer, designed to solve equations that were too complex for the (mechanical) calculating machines of the time. This massive assemblage of cog wheels, shafts, gears, and axles took up several hundred feet of floor space, and was powered by electric motors. Preparing the device to work on a particular problem required physically configuring the machine, and could take two or three days. After the machine had completed the cycle that constituted "solving" the equation, the answer was read off by measuring the changes in position of various components.

Always a "tinkerer," Shannon took to working with the Analyzer with great enthusiasm. At Bush's suggestion, for his master's thesis, he carried out a mathematical analysis of the operation of the machine's relay circuits. In 1938, he published the results of this study in the Transactions of the American Institute of Electrical Engineers under the title "A Symbolic Analysis of Relay and Switching Circuits."

Bush's seemingly mundane motivation for having Shannon do the work was the telephone industry's need for a mathematical framework in which to describe the behavior of the increasingly complex automatic switching circuits that were starting to replace human telephone operators. What Shannon produced far transcended that aim. The ten page article that he published in the Transactions of the AIEE has been described as one of the most important engineering papers ever written. And with good reason: quite simply, it set the stage for digital electronics.

Shannon began by noting that, although the Analyzer computed in an analog fashion, its behavior at any time was governed by the positions of the relay switches, and they were always in one of just two states: open or closed (or on or off). This led him to recall the work of the nineteenth century logician George Boole, whose mathematical analysis of the "laws of thought" was carried out using an algebra in which the variables have just the two "truth values" T and F (or 1 and 0). From there it was a single -- but major -- step to thinking of using relay circuits to build a digital "logic machine" that could carry out not just numerical computations but also other kinds of "information processing."

In 1940, Shannon obtained his doctorate in mathematics, and went to the Institute for Advanced Study at Princeton as a National Research Fellow, where he worked with Hermann Weyl. The following year, he took a position at the Bell Telephone Laboratories in New Jersey, joining a research group who were trying to develop more efficient ways of transmitting information and improving the reliability of long-distance telephone and telegraph lines.

In the 1950s, Shannon became interested in the idea of machine intelligence, and was one of the conveners -- together with his soon to be famous mentees John McCarthy and Marvin Minsky -- of the now legendary 1956 conference at Dartmouth College in New Hampshire that is generally acknowledged as the birth of artificial intelligence (or AI), as it later became known. But while others (McCarthy and Minsky among them) would become identified with AI, Shannon's name will be forever associated with the theory of information and communication that the world learned of from the Shannon-Weaver pamphlet.

Prior to Shannon's work, mathematicians and engineers working on communications technology saw their job as finding ways to maintain the integrity of an analog signal traveling along a wire as a fluctuating electric current or through the air as a modulated radio wave. Shannon took a very different approach. He viewed "information" as being completely encoded in digital form, as a sequence of 0s and 1s -- which he referred to as "bits" (for "binary digits"), following a suggestion of his Princeton colleague John Tukey. In addition to providing the communications engineers with a very different way of designing transmission circuits, this shift in focus also led to a concept of "information" as an objective commodity, disembodied from a human "sender" or "receiver." After Shannon, the name of the game became: how can you best send a sequence of discrete electrical or electromagnetic pulses from one point to another?

A particular consequence of this new approach, which Shannon himself was not slow to observe, was that whereas even a small variation in an analog signal distorts -- and can conceivably corrupt -- the information being carried by that signal, the discrete yes-or-no/on-or-off nature of a digital signal means that information conveyed digitally is far less prone to corruption; indeed, by adding extra bits to the signal, automatic error detection and correction can be built into the system. (A feature of digital coding that, decades later, would enable Napster users to download music files over the phone lines and play the latest pop music on their desktop PC with a fidelity limited only by the quality of the computer's sound system, and which is further exemplified by the oft-repeated claim of CD manufacturers that you can drill a centimeter hole in your favorite music CD and it will still play perfectly.)

From a mathematical point of view, arguably the most significant aspect of Shannon's new, digital conception of information is that it provides a way to measure information -- to say exactly how much information a particular signal carries. The measure is simple: you simply count the minimum number of bits it takes to encode the information. To do this, you have to show how a given item of information can be arrived at by giving the answers to a sequence of yes/no questions.

For example, suppose that eight work colleagues apply for a promotion: Alberto, Bob, Carlo, David, Enid, Fannie, Georgina, and Hilary. After the boss has chosen which person will get the position, what is the minimum number of yes/no question you have to ask to discover his or her identity? A few moments thought will indicate that the answer is 3. Thus, the information content of the message announcing who got the job is 3 bits. Here is one way to arrive at this figure:

First question: Is the person male?

That cuts down the number of possibilities from 8 to 4.

Second question: Does the person's name end in a vowel?

That reduces the field to a single pair.

Third question: Is the person the taller of the two?

Now you have your answer. Of course, this particular sequence of questions assumes that no final pair of applicants are the same height. Moreover, I rigged it to have four males and four females, with carefully chosen names. But the principle will work for any example. All you need is a framework within which a series of yes/no questions (or other binary decisions) will repeatedly halve the number of possibilities until just one remains. (If the number of possibilities at the outset is not a power of 2, there will be a little redundancy in the decision sequence, but you'll still get a measure of the information content. For example, if there were just 7 candidates, the information content of the final decision will still be 3 bits.)

Building on this simple idea, Shannon was able to develop an entire quantitative theory of information content that has proved to be of enormous importance to the engineers who have to decide how much "channel capacity" a particular communications network requires at each point. So complete was his initial analysis that, although you can find the theory described in many contemporary textbooks, you might just as well go back to his original 1949 pamphlet with Weaver. Except for one thing: the name "information theory" is misleading.

As has been pointed out by a number of workers (including myself in my 1991 book Logic and Information), Shannon's theory does not deal with "information" as that word is generally understood, rather with data -- the raw material out of which information is obtained. (See my book InfoSense for a discussion of the distinction.) In Shannon's theory, what is measured is the size of the (binary) signal. It does not matter what that signal denotes. According to Shannon's measure, any two books of 100,000 words have exactly the same information content. That's a useful (if misleading) thing to say if your goal is simply to transmit both books digitally over the Internet. But if one is an instruction manual for building a nuclear-powered submarine and the other a trashy novel, no one would claim that the two contain the same amount of "information."

By the same token, anyone who thinks that the information content of Shannon's 1948 paper can be captured by the statement that it is "100 pages worth" must surely have been in a trance for the past fifty years in which Shannon's ideas have transformed the world.

Devlin's Angle is updated at the beginning of each month.
Keith Devlin ( is Dean of Science at Saint Mary's College of California, in Moraga, California, and a Senior Researcher at Stanford University. His latest book is The Math Gene: How Mathematical Thinking Evolved and Why Numbers Are Like Gossip, published by Basic Books.