Read It Began with Babbage Online
Authors: Subrata Dasgupta
The place of Chomsky in the realm of linguistic studies is, however, not our concern here, but rather how some of his basic ideas were remarkably relevant to the development of a theory of programming (and, more generally, artificial) languages, including Algol 60. His work led to a fundamental clarification of the nature of programming languages in a number of ways.
To begin with, programming language designers before Chomsky spoke of “language,” but never quite clarified what they meant by this term. For example, in the case of Algol 60, an “algorithmic language,” it was never stated exactly which part of the
Revised Report
constituted the language.
Chomsky helped clarify this matter when he proposed that a language is a (possibly infinite) set of sentences, each sentence being composed of a finite set of elements.
107
The “set of elements” out of which a sentence is constructed forms an “alphabet” of symbols. Thus, a sentence is a string of symbols taken from this alphabet.
108
But, the sentences or symbol strings that can be formed from an alphabet of symbols are not
all
legitimate sentences in the language. There are rules that determine which are grammatical sentences in the language and which are not. Such rules form the
grammar
of that language. In fact, according to Chomsky, the grammar of a language is a device that produces all and
only
the sentences of that language.
109
A grammar, then, is a “device of some sort” that will
generate all
the grammatical sentences of a language and
only
those grammatical sentences. Suppose the “alphabet” of symbols includes the words found in a dictionary of that languageâsay, English. Then, all the grammatical sentences in English are word sequences that can be generated by the grammar for English.
What would a grammar look like and how does it generate sentences? Chomsky presented three different “models” of grammar,
110
but the one that is most illuminating for us is exemplified by the following simple grammar:
111
(i)Â Â Â Sentence â NP + VP
(ii)Â Â NP â T + Noun
(iii) VP â Verb + NP
(iv)Â Â T â the
(v)   Noun â man|ball| â¦
(vi)  Verb â hit|took| â¦
In fact, these are all “rewriting” or production rules. Each is of the form XâY, where X is a single element and Y is a string of one or more elements. The rule XâY is interpreted as “replace X with Y” or “rewrite X as Y.” The + in X+Y signifies the concatenation of X and Y; and the | signifies “or.”
The rules then apply as follows: starting with the symbol “sentence,” apply rule (i). This results in the string NP+VP. if any of the elements of this string can be rewritten by one of the rules (ii) to (vi), then select one such rule. Repeat this procedure until a sequence of symbols comprising only elements of the alphabetâa sentence of that languageâis produced.
Thus, for example, the sentence “the man hit the ball” can be generated by this grammar by the sequence of rule applications presented in
Table 13.4
.
This is not the only way the rules could be applied to generate the sentence “the man hit the ball.” Nor, of course, is it the only sentence that could be generated from these rules. Such sentences as “the man hit the man,” “the man took the ball,” “the ball took the man,” “the ball took the ball,” and so on, could also be generated from these rules. They would all be syntactically correct even though some are meaningless (that is, semantically incorrect), such as “the ball took the man.”
TABLE
13.4 Parsing the Sentence âthe man hit the ball'
Sentence | Â |
NP+VP | (by applying i) |
NP+Verb+NP | (iii) |
T+Noun+Verb+NP | (ii) |
the+Noun+Verb+NP | (iv) |
the+Noun+Verb+T+Noun | (ii) |
the+man+Verb+T+Noun | (v) |
the+man+Verb+T+ball | (v) |
the+man+hit+T+ball | (vi) |
the+man+hit+the+ball | (iv) |
We now have a sense of what Chomsky meant by a grammar. As he tells us, a grammar comprises of a finite set of “initial strings” and a finite set of production or rewrite rules.
112
In the previous example, there is only one initial string, “sentence,” although in more elaborate grammars there would be additional initial strings such as “declaration sentence,” “interrogative sentences,” and so on.
We can now see how Chomskyan theory (or a miniscule part of it) relates to programming languages. His “instruction formulas” (or productions or rewrite rules) correspond to the meta-linguistic formulas of the Algol 60
Revised Report
. They are the
syntactic rules
of Algol 60. The alphabet of symbols out of which sentences are formed in Chomsky's scheme are the basic symbols out of which Algol 60 programs are written. Chomsky's “grammatical sentences” are the set of syntactically correct
programs
that can be written using the syntactic rules defined for Algol 60. The set of basic Algol 60 symbols, together with an “initial string” (which corresponds to “sentence” in Chomsky's example grammar), which in Algol 60 is the “procedure,” along with the entire corpus of meta-linguistic formulas presented in the
Revised Report
constitute the Algol 60 grammar.
There was more to the Chomskyan connection, however. Each of the rules of his simple illustrative grammar (shown previously) is of the form
A
â α
where
A
is a single symbol denoting a syntactic category (“sentence,” “NP,” and so forth) also called a
nonterminal symbol
(because such symbols can never appear in sentences of the language) and α is a string that can contain nonterminal symbols, connectives (+), and the symbols of the language's alphabet (“man,” “hit,” and so on), which are called
terminal symbols
. A language with sentences that are generated by grammars having rules of just this type came to be called
context-free languages
and the corresponding grammars,
context-free grammars
. This was one of several language classes that Chomsky studied; he called it a “Type 2” language.
113
The rules (productions) for Algol 60, as given in the
Revised Report
were all of this form. The left-hand side of each production is a meta-linguistic variable (in other words, a single nonterminal symbol); the right-hand side is a string of nonterminal and terminal symbols. To take an example, here are a few the rules for arithmetic expressions in Algol 60
114
:
â¦
if
then
â¦
The irony was that Chomsky's Type 2 grammars were utterly inadequate for the description of
natural
languages; but, under the name of context-free grammars, they were preeminently suitable to describing
programming
languages. Context-free grammars and languages became extremely consequential in computer science, although of only passing interest in linguistics.
A theory of context free-languages and the automatic recognition of sentences in such languages would emerge during the 1960s.
115
One of the most fascinating topics in this realm was the problem of determining what kind of computational
power
was necessary for automata (including Turing machines) to recognize different classes of languages, including context-free languages. However, the interest in context-free languages was not just theoretical. There were enormous practical consequences. The “front end” of a compiler first had to ascertain the syntactic correctness (the “grammaticalness”) of the program it was compilingâa technique called
syntactic analysis
or
parsing
âand a spate of parsing algorithms for context-free programming languages were invented during the 1960s.
116
Writing compliers became “scientific!”
But what of
semantics
? What of the meaning of such things as identifiers, expressions, statements, and procedures? In the case of FORTRAN, the
Programmer's Guide
used
ordinary English to present the semantics of the language. So, too, with the Algol 60
Revised Report
. Semantics was not amenable (yet) to the kind of precise, formal descriptions that could be used for syntax. The meta-language of semantics remained English or other natural languages. Thus, although the syntax of arithmetic expressions was written in BNF, its semantics was in English:
An arithmetic expression is a rule for computing a numerical value. In case of simple arithmetic expressions this value is obtained by executing the indicated arithmetic operations on the actual numerical values of the primaries of the expression.â¦
117
There will come a time (as this story will tell) when the “problem of meaning” becomes a serious subject for investigation among language designers, programmers, and computing theorists. But not yet.
One of the ironies of this story of the birth of programming languages is that although, like the crocodiles and tortoises of old, FORTRAN and COBOL would survive the vicissitudes of computing technology as the
lingua franca
of their respective target problem environmentsâscience/engineering in the former case, business/commerce in the latterâAlgol 60 as a universal language for actually
programming computers
would be virtually extinct by the 1980s. The Esperanto of the computing world it was not to be (but, then, Esperanto itself did not catch on as the universal natural language).
Algol 60, as it turned out, was the first species of the genus Algol. The language gave rise to some subspecies in the 1960sâ“dialects” of Algol 60 such as Algol W, EULER, and PL/360. It also spawned other distinct species within the genus, including the influential simulation language SIMULA
118
and Algol 68, the official IFIP-sponsored “successor” to Algol 60 that, like its parent, went through first a
Report
(1969)
119
and then a
Revised Report
(1975).
120
Algol 68 was an entirely non-American affair, its designers and implementers drawn from Europe, Britain, and Canada,
121
and it had no influence in the United States. The sheer complexity of the languageâthe
Revised Report
was all of 230 pagesâdrew sharp reactions.
In the biological world, it is widely believed, evolution generally tends toward greater complexity of organisms.
122
So also, historically, is the general trend in the made world. Cultural evolution, especially in the realm of practical artifacts, tends toward greater complexity.
123
This was evident in the evolution from Algol 60 to Algol 68, and in the development of PL/I, a programming language developed by IBM in 1964 as a language that could be used for both scientific and business computing, and as a putative successor to FORTRAN; it was to be IBM's own universal language. Like Algol 68, PL/I was
a massive, complex language; unlike Algol 68, however, it has had a long life, although it could never usurp either FORTRAN's or COBOL's respective reigns.
Yet there have been exceptions to this evolutionary trend to increased complexity, even in the frantic, rambunctious world of computing. In response to what was perceived by some as the unmanageable complexity of Algol 68, Swiss computer scientist Nicklaus Wirth (1934â) created the “Algol-like” language Pascal in 1971,
124
perhaps the most influential programming language, at least in the academic computing world during the 1970s.
Algol 60 may have become extinct as a practical language for programming computers, but it had massive consequences for the development of (appropriating the great Galileo's terms) “two new sciences”âthe science of programming language design and (as we will see) the science of programming. It brought to the forefront of people's consciousness the very idea of designing languages that were genuinely machine independent, genuinely abstract. It demonstrated the efficacy of a precise language for describing, communicating, publishing, and, indeed,
thinking about
computer algorithms. It heralded a concern for meta-languages for describing the syntax of programming languages. It introduced entirely new concepts in language design as well as programming style that, much as Greek culture permeated the minds and consciousness of Roman and other cultures that came after them, found their ways into the languages that came after. It demonstrated the importance of the Popperian schema of proposing a design (TT) in response to a set of goals or requirement (P1), examining the design critically, identifying and eliminating flaws or errors in the design (EE), and reformulating the goals (P2). Last (although this is not a topic we will pursue further in this story), implementation of Algol 60 compilers brought into existence an entirely new class of computer architectures that came to be called
stack machines
.
125
The legacy of Algol 60 is far-flung.
As we have noted, the creators of the first programming languages were, almost without exception, trained formally in mathematics. Mathematical notation served as a source of inspiration and ideas for them, especially to those who aspired to create a “universal” language of computation (see Section XIV, this chapter). But, the designers of such languages such as Algol 60 were also mindful that their artifacts were not so much ends in themselves as means to other ends: the description, expression, and communication of computations between humans and from humans to machines. Toward this end, the austerity of strictly mathematical notation had to be tempered, and programming languages had to be given, on the one hand, a more “human face” and on the other, endowed with a sense of “implementability.”