Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers (31 page)

BOOK: Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers
11.75Mb size Format: txt, pdf, ePub

ACKNOWLEDGMENTS

You road I enter upon and look around! I believe you are not all that is here;
I believe that much unseen is also here.

—W
ALT
W
HITMAN
,
Song of the Open Road

Many friends, colleagues, and family members read some or all of the manuscript. Among them are Alex Bates, Wilson Bell, Mike Burrows, Walt Chromiak, Michael Isard, Alastair MacCormick, Raewyn MacCormick, Nico-letta Marini-Maio, Frank McSherry, Kristine Mitchell, Ilya Mironov, Wendy Pollack, Judith Potter, Cotten Seiler, Helen Takacs, Kunal Talwar, Tim Wahls, Jonathan Waller, Udi Wieder, and Ollie Williams. Suggestions from these readers resulted in a large number of substantial improvements to the manuscript. The comments of two anonymous reviewers also resulted in significant improvements.

Chris Bishop provided encouragement and advice. Tom Mitchell gave permission to use his pictures and source code in
chapter 6
.

Vickie Kearn (the book's editor) and her colleagues at Princeton University Press did a wonderful job of incubating the project and bringing it to fruition.

My colleagues in the Department of Mathematics and Computer Science at Dickinson College were a constant source of support and camaraderie.

Michael Isard and Mike Burrows showed me the joy and beauty of computing. Andrew Blake taught me how to be a better scientist.

My wife Kristine was always there and is here still;
much unseen is also here.

To all these people I express my deepest gratitude. The book is dedicated, with love, to Kristine.

SOURCES AND FURTHER READING

As explained on page 8, this book does not use in-text citations. Instead, all sources are listed below, together with suggestions of further reading for those interested in finding out more about the great algorithms of computer science.

The epigraph is from Vannevar Bush's essay “As We May Think,” originally published in the July 1945 issue of
The Atlantic
magazine.

Introduction (
chapter 1
).
For some accessible, enlightening explanations of algorithms and other computer technology, I recommend Chris Bishop's 2008 Royal Institution Christmas lectures, videos of which are freely available online. The lectures assume no prior knowledge of computer science. A. K. Dewdney's
New Turing Omnibus
usefully amplifies several of the topics covered in the present volume and introduces many more interesting computer science concepts—but some knowledge of computer programming is probably required to fully appreciate this book. Juraj Hromkovi
's
Algorithmic Adventures
is an excellent option for readers with a little mathematical background, but no knowledge of computer science. Among the many college-level computer science texts on algorithms, three particularly readable options are
Algorithms
, by Dasgupta, Papadimitriou, and Vazirani;
Algorithmics: The Spirit of Computing
, by Harel and Feldman; and
Introduction to Algorithms
, by Cormen, Leiserson, Rivest, and Stein.

Search engine indexing (
chapter 2
).
The original AltaVista patent covering the metaword trick is U.S. patent 6105019, “Constrained Searching of an Index,” by Mike Burrows (2000). For readers with a computer science background,
Search Engines: Information Retrieval in Practice
, by Croft, Metzler, and Strohman, is a good option for learning more about indexing and many other aspects of search engines.

PageRank (
chapter 3
).
The opening quotation by Larry Page is taken from an interview by Ben Elgin, published in
Businessweek
, May 3, 2004. Vannevar Bush's “As We May Think” was, as mentioned above, originally published in
The Atlantic
magazine (July 1945). Bishop's lectures (see above) contain an elegant demonstration of PageRank using a system of water pipes to emulate hyperlinks. The original paper describing Google's architecture is “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” written by Google's co-founders, Sergey Brin and Larry Page, and presented at the 1998 World Wide Web conference. The paper includes a brief description and analysis of PageRank. A much more technical, wide-ranging analysis appears in Langville and Meyer's
Google's PageRank and Beyond
—but this book requires college-level linear algebra. John Battelle's
The Search
begins with an accessible and interesting history of the web search industry, including the rise of Google. The web spam mentioned on page 36 is discussed in “Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages,” by Fetterly, Manasse, and Najork, and published in the 2004 WebDB conference.

Public key cryptography (
chapter 4
).
Simon Singh's
The Code Book
contains superb, accessible descriptions of many aspects of cryptography, including public key. It also recounts in detail the story of the secret discovery of public key cryptography at GCHQin Britain. Bishop's lectures (see above) contain a clever practical demonstration of the paint-mixing analogy for public key crypto.

Error correcting codes (
chapter 5
).
The anecdotes about Hamming are documented in Thomas M. Thompson's
From Error-Correcting Codes through Sphere Packings to Simple Groups.
The quotation from Hamming on page 60 is also given in this book and derives from a 1977 interview of Hamming by Thompson. Mathematicians will greatly enjoy Thompson's delightful book, but it definitely assumes the reader has a healthy dose of college math. Dewdney's book (see above) has two interesting chapters on coding theory. The two quotations about Shannon on pages 77-78 are taken from a brief biography by N. J. A. Sloane and A. D. Wyner, appearing in
Claude Shannon: Collected Papers
edited by Sloane and Wyner (1993).

Pattern recognition (
chapter 6
).
Bishop's lectures (see above) have some interesting material that nicely complements this chapter. The geographical data about political donations is taken from the Fundrace project of the Huffington Post. All the handwritten digit data is taken from a dataset provided by Yann LeCun, of New York University's Courant Institute, and his collaborators. Details of the dataset, which is known as the MNIST data, are discussed in the 1998 paper by LeCun et al., “Gradient-Based Learning Applied to Document Recognition.” The web spam results come from Ntoulas et al., “Detecting Spam Web Pages through Content Analysis,” published in the Proceedings of the World Wide Web Conference, 2006. The face database was created in the 1990s by a leading pattern recognition researcher, Tom Mitchell of Carnegie Mellon University. Mitchell has used this database in his classes at Carnegie Mellon and describes it in his influential book,
Machine Learning.
On the website accompanying his book, Mitchell provides a computer program to perform training and classification of neural networks on the face database. All the results for the sunglasses problem were generated using slightly modified versions of this program. Daniel Crevier gives an interesting account of the Dartmouth AI conference in
AI: The Tumultuous History of the Search for Artificial Intelligence.
The excerpt from the conference's funding proposal (on page 103) is quoted in Pamela McCorduck's 1979 book,
Machines Who Think.

Compression (
chapter 7
).
The story about Fano, Shannon, and the discovery of Huffman coding is taken from a 1989 interview of Fano by Arthur Norberg. The interview is available from the oral history archive of the Charles Babbage Institute. My favorite treatment of data compression is in
Information Theory, Inference, and Learning Algorithms
, by David MacKay, but this book requires college-level math. Dewdney's book (see above) contains a much briefer and more accessible discussion.

Databases (
chapter 8
).
There is an over-abundance of books providing an introduction to databases for beginners, but they typically explain how to
use
databases, rather than explaining how databases
work
—which was the objective of this chapter. Even college-level textbooks tend to focus on the use of databases. One exception is the second half of
Database Systems
, by Garcia-Molina, Ullman, and Widom, which gives plenty of details on the topics covered in this chapter.

Digital signatures (
chapter 9
).
Gail Grant's
Understanding Digital Signatures
provides a great deal of information about digital signatures and is reasonably accessible to those without a computer science background.

Computability (
chapter 10
).
The chapter's opening quotation is from a talk given by Richard Feynman at Caltech on December 29,1959. The title of the talk is “There's Plenty of Room at the Bottom,” and it was later published in Caltech's
Engineering & Science
magazine (February 1960). One unconventional, but very interesting, presentation of the concepts surrounding computability and undecidability is in the form of a (fictional) novel:
Turing (A Novel about Computation)
, by Christos Papadimitriou.

Conclusion (
chapter 11
).
The Stephen Hawking lecture, “The Future of the Universe,” was the 1991 Darwin lecture given at the University of Cambridge, also reprinted in Hawking's book
Black Holes and Baby Universes.
The televised A. J. P. Taylor lecture series was entitled
How Wars Begin
, and was also published as a book in 1977.

INDEX

The index that appeared in the print version of this title does not match the pages in your eBook. Please use the search function on your eReading device to search for terms of interest. For your reference, the terms that appear in the print index are listed below.

 

AAC

abort.
See
transaction addition

algorithm

addition trick

Adleman, Leonard

Advanced Encryption Standard

advertisement

AES.
See
Advanced Encryption Standard

AI.
See
artificial intelligence

algorithm: books on; criteria for greatness; definition of; future of; lack of; relationship to programming; significance of.
See also
addition algorithm; checksum; compression; digital signature; error-correcting code; Dijkstra's shortest-path algorithm; Euclid's algorithm; factorization; JPEG; key exchange; LZ77; matching; nine algorithms; PageRank; public key; ranking; RSA; web search

AltaVista

AlwaysYes.exe

Amazon

Analytical

Engine

AntiCrashOnSelf.exe

AntiYesOnSelf.exe

Apple

artifact.
See
compression

artificial intelligence.
See also
pattern recognition

artificial neural network.
See
neural network
As We May Think

astronomy

Atlantic
magazine

atomic.
See
transaction audio.
See also

compression Austen, Jane

authentication

authority: score; of a web page.
See also
certification authority

authority trick

 

B-tree

Babylonia

backup

bank; account number; balance; for keys; online banking; for signatures; transfer; as trusted third party

base, in exponentiation

Battelle, John

Bell Telephone Company

binary

Bing

biology

biometric sensor

Bishop, Christopher

bit

block cipher

body, of a web page

brain

Brin, Sergey

British government

browser

brute force

bug

Burrows, Mike

Bush, Vannevar

Businessweek

Byzantine fault tolerance

 

C++ programming language

CA.
See
certification authority calculus

Caltech

Cambridge

CanCrash.exe

CanCrashWeird.exe

Carnegie Mellon University

CD

cell phone.
See
phone certificate

certification authority

Charles Babbage Institute

chat-bot

checkbook

checksum; in practice; simple; staircase.
See also
cryptographic hash function

checksum trick

chemistry

chess

Church, Alonzo

Church-Turing thesis

citations

class

classification

classifier

clock arithmetic

clock size; conditions on; factorization of; need for large; primary; as a public number; in RSA; secondary

Codd, E. F.

code word

commit phase

compact disk.
See
CD compression; via AAC (
see
AAC); artifact; of audio or music; history of; of images; via JPEG (see JPEG); lossless; lossy; via MP3 (see MP3); relation to error-correcting code; uses of; of video

computable: number; problem.
See also
uncomputable

computer: 1980s and ‘90s; accuracy requirement of; appreciation of; classical; compared to humans; early; error (
see also
error-correcting code; error detection); first electronic; fundamental jobs of; human calculator; intelligent (
see
artificial intelligence); laptop (see laptop); limitations on; mechanical; modern; quantum (
see
quantum computing); router (
see
router); server (
see
server); users.
See also
hardware

computer program; analyzing another program; executable; impossible; input and output; intelligent; programmers; programming; programming languages; verification; world's first programmer; yes-no

computer programming.
See
computer program

computer science; beauty in; certainty in; curriculum; founding of; in high school; introductory teaching; popularity of; predictions about; public perception of; research; in society; theory of; undecidable problems in

Computing Machinery and

Intelligence

concurrency

consciousness

consistency.
See also
inconsistency

contradiction.
See
proof by contradiction

Cormen, Thomas

cosmology

Covenant Woman

CPU

crash; intentional

Crashing Problem

CrashOnSelf.exe, CRC32

credit card

Crevier, Daniel

Croft, Bruce

cryptographic hash function

cryptography; public key (see public key cryptography) cuneiform

cycle

 

Dartmouth AI conference

Dasgupta, Sanjoy

data center

database; column; definition of; geographically replicated; of faces; relational; replicated; row; table.
See also
virtual table

deadlock

decision tree

decrypt

Deep Blue

Democrat

Dewdney, A. K.

Dickens, Charles

Diffie, Whitfield

Diffie-Hellman.
See
key exchange

digital signature; applications of; connection to cryptography; detect forgery of; of long messages; in practice; security of.
See also
RSA; signature

Dijkstra's shortest-path algorithm

discrete exponentiation

discrete logarithm

disk.
See
hard disk distributed hash table

double-click

Doyle, Arthur Conan

drive.
See
hard disk DVD

Dylan, Bob

 

eBay

e-commerce

Elgin, Ben

e-mail

Emma

encrypt; 128-bit encryption

engineering

Entscheidungsproblem

error detection

error-correcting code; relation to compression

Essay Concerning Human

Understanding

Ethernet

Euclid

Euclid's algorithm

excitatory

exponent

exponentiation.
See also
discrete exponentiation; power notation

extension.
See
file name extension

 

face database.
See
database

face recognition

Facebook

factorization

Fano, Robert

Faraday, Michael fault-tolerance

fax

Feldman, Yishai

Fetterly, Dennis

Feynman, Richard

file name extension;

unhide

financial information

finite field algebra

flash memory.
See
memory

forgery.
See also
digital signature

freeze

Freeze.exe

Fundrace project

 

garage

Garcia-Molina, Hector

GCHQ

genetics

GeoTrust

GlobalSign

Google

Grant, Gail

Gray, Jim

Great American Music Hall

 

hacker

halt.
See
terminate

halting problem

Hamming, Richard

Hamming code

handwriting recognition

hard disk; failure; operation; space

hardware; failures of

Hardy, G. H.

Harel, David

hash tables

Hawking, Stephen

haystack

Hellman, Martin

Hewlett, Dave

Hewlett-Packard

hidden files

high-definition

hit, for web search query

Holmes, Sherlock

Hromkovc, Juraj

HTML

http

https

Huffington Post

Huffman, David

Huffman coding

hyperlink; cycle of (
see
cycle); incoming

hyperlink trick

 

IBM

ICT

idempotent

incoming link.
See
hyperlink

inconsistency; after a crash; of replicas.
See also
consistency

index.
See also
indexing

indexing; AltaVista patent on; history of; using metawords; with word locations

information retrieval

information theory

Infoseek

inhibitory

insurance

integer factorization.
See
factorization

internet; addresses; communication via; companies; protocols; standards; surfing

intitle, web search keyword

 

Japanese

Java programming language

Jobs, Steve

join operation

JPEG

 

Kasparov, Garry

key: in cryptography (
see also
public key; shared secret); in a database; in digital signature; physical

key exchange; Diffie-Hellman

keyboard

kilobyte

K-nearest-neighbors

 

labeled

Langville, Amy N.

laptop

learning.
See also
training

leave-it-out trick

LeCun, Yann

Leiserson, Charles

Lempel, Abraham

license plate

Lincoln, Abraham

linear algebra

link.
See
hyperlink

link-based ranking.
See
ranking

Live Search

lock: in cryptography; in a database

lock up.
See
freeze lockbox

Locke, John

logarithm.
See also
discrete logarithm

Los Altos

lossless compression.
See
compression

lossy compression.
See
compression

Lovelace, Ada

Love's Labour's Lost

low-density parity-check code

Lycos

LZ77

 

Machine Learning
(book)

machine learning.
See
pattern recognition

MacKay, David

Manasse, Mark

master.
See
replica matching

mathematician

Mathematician's Apology
, A

mathematics; ancient problems in; beauty in; certainty in; history of; pretend

McCorduck, Pamela

MD5

medicine

megapixel

memex

memory: computer; flash

Menlo Park

metaword; in HTML

metaword trick; definition of.
See also
indexing

Metzler, Donald

Meyer, Carl D.

Microsoft

Microsoft Excel

Microsoft Office

Microsoft Research

Microsoft Word

mind

MIT

Mitchell, Tom

MNIST

mobile phone.
See
phone monitor

MP3

MSN

multiplicative padlock trick

MySpace

 

Najork, Marc

NameSize.exe

NEAR keyword in search query; for ranking

nearest-neighbor classifier

nearest-neighbor trick

Netix

network: computer; equipment; neural (
see
neural network); protocol; social (
see
social network)

neural network; artificial; biological; convolutional; for sunglasses problem; for umbrella problem; training

neuron

neuroscience

New York

New York University

nine algorithms

Nobel Prize

Norberg, Arthur

Ntoulas, Alexandras

number-mixing trick

 

object recognition

one-way action

online banking.
See
bank

online bill payment

operating system

overhead

Oxford

 

packet

padlock.
See
physical padlock trick

page size

Page, Larry

PageRank

paint-mixing trick

Palo Alto

Papadimitriou, Christos

paradox

parity

password

patent

pattern recognition; applications of; connection to artificial intelligence; failures in; history of; manual effort in; preprocessing in; use of judgment in

PC Magazine

peer-to-peer system

philosophy

phone; bill; number.
See also
Bell Telephone Company

photograph

phrase query

physical padlock trick

physics

pinpoint trick

pixel

postcard

postcode

power: electrical; failure; raising to a

power notation.
See also
exponentiation

PPN.
See
public-private number

prepare phase

prepare-then-commit trick

preprocessing

prime number

primitive root

private color

private number

probability.
See also
restart

probability program.
See
computer program

ProgramA.exe

ProgramB.exe

programming.
See
computer program

projection operation

proof by contradiction

public color

public key

public key cryptography; connection to digital signatures.
See also
cryptography

public number

public-private mixture

public-private number

pulse rate

pure

Python programming language

 

quantum computing

quantum mechanics

quicksort

 

random surfer trick

ranking; link-based; and nearness.
See also
PageRank

reboot

redundancy

redundancy trick

Reed, Irving

Reed-Solomon code

relational algebra

relational database.
See
database

Other books

A Scrying Shame by Donna White Glaser
The Last Martin by Jonathan Friesen
Hope and Red by Jon Skovron
Foreign Influence by Brad Thor