Authors: John MacCormick,Chris Bishop
ACKNOWLEDGMENTS
You road I enter upon and look around! I believe you are not all that is here;
I believe that much unseen is also here.
—W
ALT
W
HITMAN
,
Song of the Open Road
Many friends, colleagues, and family members read some or all of the manuscript. Among them are Alex Bates, Wilson Bell, Mike Burrows, Walt Chromiak, Michael Isard, Alastair MacCormick, Raewyn MacCormick, Nico-letta Marini-Maio, Frank McSherry, Kristine Mitchell, Ilya Mironov, Wendy Pollack, Judith Potter, Cotten Seiler, Helen Takacs, Kunal Talwar, Tim Wahls, Jonathan Waller, Udi Wieder, and Ollie Williams. Suggestions from these readers resulted in a large number of substantial improvements to the manuscript. The comments of two anonymous reviewers also resulted in significant improvements.
Chris Bishop provided encouragement and advice. Tom Mitchell gave permission to use his pictures and source code in
chapter 6
.
Vickie Kearn (the book's editor) and her colleagues at Princeton University Press did a wonderful job of incubating the project and bringing it to fruition.
My colleagues in the Department of Mathematics and Computer Science at Dickinson College were a constant source of support and camaraderie.
Michael Isard and Mike Burrows showed me the joy and beauty of computing. Andrew Blake taught me how to be a better scientist.
My wife Kristine was always there and is here still;
much unseen is also here.
To all these people I express my deepest gratitude. The book is dedicated, with love, to Kristine.
SOURCES AND FURTHER READING
As explained on page 8, this book does not use in-text citations. Instead, all sources are listed below, together with suggestions of further reading for those interested in finding out more about the great algorithms of computer science.
The epigraph is from Vannevar Bush's essay “As We May Think,” originally published in the July 1945 issue of
The Atlantic
magazine.
Introduction (
chapter 1
).
For some accessible, enlightening explanations of algorithms and other computer technology, I recommend Chris Bishop's 2008 Royal Institution Christmas lectures, videos of which are freely available online. The lectures assume no prior knowledge of computer science. A. K. Dewdney's
New Turing Omnibus
usefully amplifies several of the topics covered in the present volume and introduces many more interesting computer science concepts—but some knowledge of computer programming is probably required to fully appreciate this book. Juraj Hromkovi
's
Algorithmic Adventures
is an excellent option for readers with a little mathematical background, but no knowledge of computer science. Among the many college-level computer science texts on algorithms, three particularly readable options are
Algorithms
, by Dasgupta, Papadimitriou, and Vazirani;
Algorithmics: The Spirit of Computing
, by Harel and Feldman; and
Introduction to Algorithms
, by Cormen, Leiserson, Rivest, and Stein.
Search engine indexing (
chapter 2
).
The original AltaVista patent covering the metaword trick is U.S. patent 6105019, “Constrained Searching of an Index,” by Mike Burrows (2000). For readers with a computer science background,
Search Engines: Information Retrieval in Practice
, by Croft, Metzler, and Strohman, is a good option for learning more about indexing and many other aspects of search engines.
PageRank (
chapter 3
).
The opening quotation by Larry Page is taken from an interview by Ben Elgin, published in
Businessweek
, May 3, 2004. Vannevar Bush's “As We May Think” was, as mentioned above, originally published in
The Atlantic
magazine (July 1945). Bishop's lectures (see above) contain an elegant demonstration of PageRank using a system of water pipes to emulate hyperlinks. The original paper describing Google's architecture is “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” written by Google's co-founders, Sergey Brin and Larry Page, and presented at the 1998 World Wide Web conference. The paper includes a brief description and analysis of PageRank. A much more technical, wide-ranging analysis appears in Langville and Meyer's
Google's PageRank and Beyond
—but this book requires college-level linear algebra. John Battelle's
The Search
begins with an accessible and interesting history of the web search industry, including the rise of Google. The web spam mentioned on page 36 is discussed in “Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages,” by Fetterly, Manasse, and Najork, and published in the 2004 WebDB conference.
Public key cryptography (
chapter 4
).
Simon Singh's
The Code Book
contains superb, accessible descriptions of many aspects of cryptography, including public key. It also recounts in detail the story of the secret discovery of public key cryptography at GCHQin Britain. Bishop's lectures (see above) contain a clever practical demonstration of the paint-mixing analogy for public key crypto.
Error correcting codes (
chapter 5
).
The anecdotes about Hamming are documented in Thomas M. Thompson's
From Error-Correcting Codes through Sphere Packings to Simple Groups.
The quotation from Hamming on page 60 is also given in this book and derives from a 1977 interview of Hamming by Thompson. Mathematicians will greatly enjoy Thompson's delightful book, but it definitely assumes the reader has a healthy dose of college math. Dewdney's book (see above) has two interesting chapters on coding theory. The two quotations about Shannon on pages 77-78 are taken from a brief biography by N. J. A. Sloane and A. D. Wyner, appearing in
Claude Shannon: Collected Papers
edited by Sloane and Wyner (1993).
Pattern recognition (
chapter 6
).
Bishop's lectures (see above) have some interesting material that nicely complements this chapter. The geographical data about political donations is taken from the Fundrace project of the Huffington Post. All the handwritten digit data is taken from a dataset provided by Yann LeCun, of New York University's Courant Institute, and his collaborators. Details of the dataset, which is known as the MNIST data, are discussed in the 1998 paper by LeCun et al., “Gradient-Based Learning Applied to Document Recognition.” The web spam results come from Ntoulas et al., “Detecting Spam Web Pages through Content Analysis,” published in the Proceedings of the World Wide Web Conference, 2006. The face database was created in the 1990s by a leading pattern recognition researcher, Tom Mitchell of Carnegie Mellon University. Mitchell has used this database in his classes at Carnegie Mellon and describes it in his influential book,
Machine Learning.
On the website accompanying his book, Mitchell provides a computer program to perform training and classification of neural networks on the face database. All the results for the sunglasses problem were generated using slightly modified versions of this program. Daniel Crevier gives an interesting account of the Dartmouth AI conference in
AI: The Tumultuous History of the Search for Artificial Intelligence.
The excerpt from the conference's funding proposal (on page 103) is quoted in Pamela McCorduck's 1979 book,
Machines Who Think.
Compression (
chapter 7
).
The story about Fano, Shannon, and the discovery of Huffman coding is taken from a 1989 interview of Fano by Arthur Norberg. The interview is available from the oral history archive of the Charles Babbage Institute. My favorite treatment of data compression is in
Information Theory, Inference, and Learning Algorithms
, by David MacKay, but this book requires college-level math. Dewdney's book (see above) contains a much briefer and more accessible discussion.
Databases (
chapter 8
).
There is an over-abundance of books providing an introduction to databases for beginners, but they typically explain how to
use
databases, rather than explaining how databases
work
—which was the objective of this chapter. Even college-level textbooks tend to focus on the use of databases. One exception is the second half of
Database Systems
, by Garcia-Molina, Ullman, and Widom, which gives plenty of details on the topics covered in this chapter.
Digital signatures (
chapter 9
).
Gail Grant's
Understanding Digital Signatures
provides a great deal of information about digital signatures and is reasonably accessible to those without a computer science background.
Computability (
chapter 10
).
The chapter's opening quotation is from a talk given by Richard Feynman at Caltech on December 29,1959. The title of the talk is “There's Plenty of Room at the Bottom,” and it was later published in Caltech's
Engineering & Science
magazine (February 1960). One unconventional, but very interesting, presentation of the concepts surrounding computability and undecidability is in the form of a (fictional) novel:
Turing (A Novel about Computation)
, by Christos Papadimitriou.
Conclusion (
chapter 11
).
The Stephen Hawking lecture, “The Future of the Universe,” was the 1991 Darwin lecture given at the University of Cambridge, also reprinted in Hawking's book
Black Holes and Baby Universes.
The televised A. J. P. Taylor lecture series was entitled
How Wars Begin
, and was also published as a book in 1977.
INDEX
The index that appeared in the print version of this title does not match the pages in your eBook. Please use the search function on your eReading device to search for terms of interest. For your reference, the terms that appear in the print index are listed below.
AAC
abort.
See
transaction addition
algorithm
addition trick
Adleman, Leonard
Advanced Encryption Standard
advertisement
AES.
See
Advanced Encryption Standard
AI.
See
artificial intelligence
algorithm: books on; criteria for greatness; definition of; future of; lack of; relationship to programming; significance of.
See also
addition algorithm; checksum; compression; digital signature; error-correcting code; Dijkstra's shortest-path algorithm; Euclid's algorithm; factorization; JPEG; key exchange; LZ77; matching; nine algorithms; PageRank; public key; ranking; RSA; web search
AltaVista
AlwaysYes.exe
Amazon
Analytical
Engine
AntiCrashOnSelf.exe
AntiYesOnSelf.exe
Apple
artifact.
See
compression
artificial intelligence.
See also
pattern recognition
artificial neural network.
See
neural network
As We May Think
astronomy
Atlantic
magazine
atomic.
See
transaction audio.
See also
compression Austen, Jane
authentication
authority: score; of a web page.
See also
certification authority
authority trick
B-tree
Babylonia
backup
bank; account number; balance; for keys; online banking; for signatures; transfer; as trusted third party
base, in exponentiation
Battelle, John
Bell Telephone Company
binary
Bing
biology
biometric sensor
Bishop, Christopher
bit
block cipher
body, of a web page
brain
Brin, Sergey
British government
browser
brute force
bug
Burrows, Mike
Bush, Vannevar
Businessweek
Byzantine fault tolerance
C++ programming language
CA.
See
certification authority calculus
Caltech
Cambridge
CanCrash.exe
CanCrashWeird.exe
Carnegie Mellon University
CD
cell phone.
See
phone certificate
certification authority
Charles Babbage Institute
chat-bot
checkbook
checksum; in practice; simple; staircase.
See also
cryptographic hash function
checksum trick
chemistry
chess
Church, Alonzo
Church-Turing thesis
citations
class
classification
classifier
clock arithmetic
clock size; conditions on; factorization of; need for large; primary; as a public number; in RSA; secondary
Codd, E. F.
code word
commit phase
compact disk.
See
CD compression; via AAC (
see
AAC); artifact; of audio or music; history of; of images; via JPEG (see JPEG); lossless; lossy; via MP3 (see MP3); relation to error-correcting code; uses of; of video
computable: number; problem.
See also
uncomputable
computer: 1980s and ‘90s; accuracy requirement of; appreciation of; classical; compared to humans; early; error (
see also
error-correcting code; error detection); first electronic; fundamental jobs of; human calculator; intelligent (
see
artificial intelligence); laptop (see laptop); limitations on; mechanical; modern; quantum (
see
quantum computing); router (
see
router); server (
see
server); users.
See also
hardware
computer program; analyzing another program; executable; impossible; input and output; intelligent; programmers; programming; programming languages; verification; world's first programmer; yes-no
computer programming.
See
computer program
computer science; beauty in; certainty in; curriculum; founding of; in high school; introductory teaching; popularity of; predictions about; public perception of; research; in society; theory of; undecidable problems in
Computing Machinery and
Intelligence
concurrency
consciousness
consistency.
See also
inconsistency
contradiction.
See
proof by contradiction
Cormen, Thomas
cosmology
Covenant Woman
CPU
crash; intentional
Crashing Problem
CrashOnSelf.exe, CRC32
credit card
Crevier, Daniel
Croft, Bruce
cryptographic hash function
cryptography; public key (see public key cryptography) cuneiform
cycle
Dartmouth AI conference
Dasgupta, Sanjoy
data center
database; column; definition of; geographically replicated; of faces; relational; replicated; row; table.
See also
virtual table
deadlock
decision tree
decrypt
Deep Blue
Democrat
Dewdney, A. K.
Dickens, Charles
Diffie, Whitfield
Diffie-Hellman.
See
key exchange
digital signature; applications of; connection to cryptography; detect forgery of; of long messages; in practice; security of.
See also
RSA; signature
Dijkstra's shortest-path algorithm
discrete exponentiation
discrete logarithm
disk.
See
hard disk distributed hash table
double-click
Doyle, Arthur Conan
drive.
See
hard disk DVD
Dylan, Bob
eBay
e-commerce
Elgin, Ben
e-mail
Emma
encrypt; 128-bit encryption
engineering
Entscheidungsproblem
error detection
error-correcting code; relation to compression
Essay Concerning Human
Understanding
Ethernet
Euclid
Euclid's algorithm
excitatory
exponent
exponentiation.
See also
discrete exponentiation; power notation
extension.
See
file name extension
face database.
See
database
face recognition
Facebook
factorization
Fano, Robert
Faraday, Michael fault-tolerance
fax
Feldman, Yishai
Fetterly, Dennis
Feynman, Richard
file name extension;
unhide
financial information
finite field algebra
flash memory.
See
memory
forgery.
See also
digital signature
freeze
Freeze.exe
Fundrace project
garage
Garcia-Molina, Hector
GCHQ
genetics
GeoTrust
GlobalSign
Google
Grant, Gail
Gray, Jim
Great American Music Hall
hacker
halt.
See
terminate
halting problem
Hamming, Richard
Hamming code
handwriting recognition
hard disk; failure; operation; space
hardware; failures of
Hardy, G. H.
Harel, David
hash tables
Hawking, Stephen
haystack
Hellman, Martin
Hewlett, Dave
Hewlett-Packard
hidden files
high-definition
hit, for web search query
Holmes, Sherlock
Hromkovc, Juraj
HTML
http
https
Huffington Post
Huffman, David
Huffman coding
hyperlink; cycle of (
see
cycle); incoming
hyperlink trick
IBM
ICT
idempotent
incoming link.
See
hyperlink
inconsistency; after a crash; of replicas.
See also
consistency
index.
See also
indexing
indexing; AltaVista patent on; history of; using metawords; with word locations
information retrieval
information theory
Infoseek
inhibitory
insurance
integer factorization.
See
factorization
internet; addresses; communication via; companies; protocols; standards; surfing
intitle, web search keyword
Japanese
Java programming language
Jobs, Steve
join operation
JPEG
Kasparov, Garry
key: in cryptography (
see also
public key; shared secret); in a database; in digital signature; physical
key exchange; Diffie-Hellman
keyboard
kilobyte
K-nearest-neighbors
labeled
Langville, Amy N.
laptop
learning.
See also
training
leave-it-out trick
LeCun, Yann
Leiserson, Charles
Lempel, Abraham
license plate
Lincoln, Abraham
linear algebra
link.
See
hyperlink
link-based ranking.
See
ranking
Live Search
lock: in cryptography; in a database
lock up.
See
freeze lockbox
Locke, John
logarithm.
See also
discrete logarithm
Los Altos
lossless compression.
See
compression
lossy compression.
See
compression
Lovelace, Ada
Love's Labour's Lost
low-density parity-check code
Lycos
LZ77
Machine Learning
(book)
machine learning.
See
pattern recognition
MacKay, David
Manasse, Mark
master.
See
replica matching
mathematician
Mathematician's Apology
, A
mathematics; ancient problems in; beauty in; certainty in; history of; pretend
McCorduck, Pamela
MD5
medicine
megapixel
memex
memory: computer; flash
Menlo Park
metaword; in HTML
metaword trick; definition of.
See also
indexing
Metzler, Donald
Meyer, Carl D.
Microsoft
Microsoft Excel
Microsoft Office
Microsoft Research
Microsoft Word
mind
MIT
Mitchell, Tom
MNIST
mobile phone.
See
phone monitor
MP3
MSN
multiplicative padlock trick
MySpace
Najork, Marc
NameSize.exe
NEAR keyword in search query; for ranking
nearest-neighbor classifier
nearest-neighbor trick
Netix
network: computer; equipment; neural (
see
neural network); protocol; social (
see
social network)
neural network; artificial; biological; convolutional; for sunglasses problem; for umbrella problem; training
neuron
neuroscience
New York
New York University
nine algorithms
Nobel Prize
Norberg, Arthur
Ntoulas, Alexandras
number-mixing trick
object recognition
one-way action
online banking.
See
bank
online bill payment
operating system
overhead
Oxford
packet
padlock.
See
physical padlock trick
page size
Page, Larry
PageRank
paint-mixing trick
Palo Alto
Papadimitriou, Christos
paradox
parity
password
patent
pattern recognition; applications of; connection to artificial intelligence; failures in; history of; manual effort in; preprocessing in; use of judgment in
PC Magazine
peer-to-peer system
philosophy
phone; bill; number.
See also
Bell Telephone Company
photograph
phrase query
physical padlock trick
physics
pinpoint trick
pixel
postcard
postcode
power: electrical; failure; raising to a
power notation.
See also
exponentiation
PPN.
See
public-private number
prepare phase
prepare-then-commit trick
preprocessing
prime number
primitive root
private color
private number
probability.
See also
restart
probability program.
See
computer program
ProgramA.exe
ProgramB.exe
programming.
See
computer program
projection operation
proof by contradiction
public color
public key
public key cryptography; connection to digital signatures.
See also
cryptography
public number
public-private mixture
public-private number
pulse rate
pure
Python programming language
quantum computing
quantum mechanics
quicksort
random surfer trick
ranking; link-based; and nearness.
See also
PageRank
reboot
redundancy
redundancy trick
Reed, Irving
Reed-Solomon code
relational algebra
relational database.
See
database