CSE 230: Principles of Programming Languages
Some Notes on Readings

These notes are intended to supplement or correct material in the texts. They assume familiarity with the readings and are intentionally brief. Stansifer was published in 1995, so presumably written in 1994, and hence is a bit out of date. However, it takes a broad view of the role of programming languages and their study, focussing on principles behind design, and including historical and cultural information, as well as some underlying mathematics, all of which I feel every well educated computer scientist should understand to some extent.

First, a general remark about the class: because we need to talk about programming languages in general, rather than just one particular language, we will need to develop some rather sophisticated notation that can describe the syntax and the semantics of programming (and other) languages.


A. Notes on Stansifer

1.1 Much recent research suggests that natural language is not formal and cannot be formalized, due to its dependence on enormous amounts of background information and social context, each of which is highly variable, as well as difficult or impossible to formalize.

Findings from 1998 at the first pyramid show that Egyptian writing is at least as old as Summerian, and moreover, used a phonetic alphabet. Hence Stansifer's remarks on the evolution of the alphabet are out of date. The shards found there were records of tributes to the pharoah.

Recent archaelogical research also shows that forms of representation and calculation arose about 5,000 years ago, in ancient Egypt and Summeria, which needed ways to represent and calculate with numbers, largely for what we would now call accounting data. We would now also speak of data types and algorithms, rather than representation and calculation.

1.2 Work on the sociology of science shows that all mathematics has its origin in practical work; moreover, even today, mathematics research is largely driven by practical considerations; the notion of "pure mathematics" is somewhat of a myth, though a powerful one.

Perhaps Gottfried Leibniz the first to dream of a language for computation, though his dream went well beyond the programming languages that we have today since he envisioned using them for all forms of human reasoning, including legal reasoning. Leibniz also invented binary numbers, inspired by patterns found in the ancient Chinese book of divination, the I Ching. Leibniz is also famour for having invented the calculus, about the same time as, and independently from, Isaac Newton.

The novel The Difference Engine by William Gibson and Bruce Sterling contains an amusing fictional account of Babbage and Ada Augusta Lovelace, featuring giant steam driven computers (and much more).

1.4 Both Turing and von Neumann used assertions and invariants (which are the two main ideas) to prove correctness of flow chart programs, which they each introduced separately in the early 1940s. Turing also designed the first stored program electronic computer, although it was later redesigned and built by others in the UK. This information was secret until rather recently, because Turing's work was an important part of allied World War II efforts to break German and Japanese secret codes for messages. (There are several good books, and even a good play on Alan Turing's very interesting but rather sad life; the author of one of those books, Andrew Hodges, maintains a website devoted to the life of Turing, which includes a page on the play.)

1.5 The most important thing about FORTRAN was its excellent optimizating compiler; without this, assembly language programmers would never have changed their habits. ALGOL was designed by a (largely European) committee, and was the first langauge to have an international standard; it also introduced many important new features. PL/I failed mostly because it was too large. C was designed for systems programming, and cleverly combined high and low level features, based on ideas from BCPL, which was based on Christopher Strachey's very innovative CPL language. Simula, Smalltalk, and of course C++ all grew out of the imperative tradition, and the object oriented paradigm can be considered a variant of the imperative paradigm, as is rather clear from the quotations from Alan Kay given in Stansifer. Simula was developed by Kirstin Nygaard and Johan-Ole Dahl at the Norwegian Computing Center, and was originally designed for applications to simulation.

It is interesting to observe that the three major programming paradigms grew out of three major approaches to computable functions. Imperative (and object oriented) languages grew out of Turing machines, which also inspired the von Neumann architecture on which they usually run. The functional programming paradigm grew out of the lambda calculus approach to computable functions due to Alonzo Church, which was soon proved equivalent to the Turing machine approach. LISP was the first functional language, and it directly includes lambda abstraction and higher order functions. LISP was also the first interactive language, as well as the first to have garbage collection and to support symbolic computation; the latter was important for its intended applications to artificial intelligence, as was the fact that its programs were also written in its S-expression data structure. More recent functional programming languages are ML and Haskell; the latter takes its name from Haskell Curry, who introduced a variant of the lambda calculus which he called the combinatory calculus.

The so-called logic programming paradigm is related to a different notion of computability, having to do with the manipulation of algebraic terms, arising from work of Jacques Herbrand. (I say "so-called" because I think it is a misleading name, since its syntax is based on Horn clauses rather than full first order logic, and in any case, there are many many other logics than first order and Horn clause logic.) ML and Prolog both originated at Edinburgh from research on theorem proving and logic, in groups led by Robin Milner and Robert Kowalski, respectively, though these languages were much more fully developed elsewhere.

Advocates of so called structured programming railed against the infamous GOTO statement, claiming that it produced spaghetti code. Many today feel that unrestricted inheritance in object oriented programming is just as bad, especially in large systems that use dynamic binding and multiple inheritance.

The kind of modules found in Modula-3, Ada, C++ and ML have their origin in theoretical work by Goguen on abstract data types and general systems, and in the language designs for Clear (with Burstall) and OBJ (or so he says).

The most important ideas for personal computing came from Doug Englebart at SRI (then Stanford Research Institute): the mouse, windows, menus, and networking (recall that interaction came from LISP). Alan Kay at Xerox PARC popularized these ideas in Smalltalk, also adding icons; others at PARC developed the ethernet and the laser printer. Apple added very little to this mixture, and Microsoft has added nothing of intellectual significance (companies spend a lot on advertising their alleged creativity, often at the expense of the scientists who actually did the work).

The "Griswold" mentioned in connection with SNOBOL is the father of our own Prof. Bill Griswold.

An overview of the history of programming languages reveals a progressive increase in the level of abstraction: machine language; assembly language; (so called) "high level" languages like FORTRAN; and then ever more powerful features to support abstraction, including blocks, procedures, types, recursion, classes with inheritance, modules, specification, ... In general, this correlates with improvements in the underlying hardware. Perhaps machine language makes sense if your hardware is (relatively) small systems of gears and shafts, and your programs only compute repetative tables, as was the case for Babbage and Lovelace; assembly language perhaps makes sense if your hardware consists of (relatively) few vacuum tubes and your programs are for (relatively) small numerical tasks; but powerful mainframes and PCs running large programs require much more abstraction.

Perhaps the most glaring omission in Stansifer's book is Java, which did not come out until after this book was written. Because it is a language intended to be used over the internet, it has a very different character from the other languages discussed above; in particular, Java applets have the form of byte code, downloaded from a server to a client machine and run on there on a Java abstract machine; security is therefore an extremely important issue. Here, the hardware is the internet, not just a CPU with associated memory and peripherals.

To summarize a bit, we can see that there is a really close relationship between mathematics and programming. In particular, mathematicians were first inspired to build computers by the need to solve mathematical problems, and the architectures that they chose grew out of the mathematics that they knew. Moreover, the major programming paradigms correspond to different ways of defining the notion of computable function, and the historical trend of rising levels of abstraction also follows trends found in mathematics.

My Preliminary Essay on Comparative Programming Linguistics gives some further information related to topics discussed above.

2.1 Stansifer's ideas on natural language seem to have come mainly from formalists like Chomsky, rather than from linguists who study what real people actually write and say. For example, it is easy to write a short play about painters redoing a trading room in a bank, where desks are named "one desk", "two desk", "FX desk", etc., and where one of the painters has the line, "Painted two desk" in response to his boss's asking what he did. A number of disgruntled empirical linguists have written little poems that end with the line "colorless green ideas sleep furiously", meaning something like "Chomsky's uninteresting untried theories do nothing much after a lot of effort". (It is an interesting exercise to try this yourself.) Similarly, it easy to imagine a Star Trek episode in which some creatures called "time flies" have affection for a certain arrow. We may conclude that almost anything can be made meaningful, given the right context.

The important point here is that in natural language, context determines whether or not something makes sense; formal syntax and semantics are very far from adequate, and indeed, the distinction among syntax, semantics and pragmatics does not hold up under close examination. On the other hand, the formal linguists' way of looking at syntax and semantics works rather well for programming languages, because we can define things to work that way, and because traditionally, programming language designers want to achieve as much independence from context as we can (though this might change in the future).

2.1.1 These principles are really important; please think about them, and the examples that are given. Also, notice that the situation for natural language is very different.

2.2 If we denote the empty set by "{}" and the empty string by the empty character, then there will not be any way to tell the difference between the empty set and the set containing the empty string. So this is a bad idea. Instead, we can use the greek letter epsilon for the empty string, and the Danish letter "O-with-slash" for the empty set, as is usual in mathematics. Sometimes I like to write "[]" for the empty string, while Stansifer sometimes writes "" for it, and some other people use the Greek letter lambda! I am afraid that you will have to get used to there being a lot of notational variantion in mathematics, just as there is a lot of notational variation for programming languages; in fact, I am afraid that notational variation will be an ongoing part of the life of every computer science professional for the foreseeable future, so that developing a tolerance for it should be part of your professional training. But to help out a bit, I will write the epsilon for the empty string and for set membership in different ways; the set epsilon will be much bigger, while the string epsilon will be (insofar as I can do it) slanted a bit.

Notice that the definition of "language" in section 2.2.1 is not suitable for natural language, where context determines whether something is acceptable or not, and where even then, there may be degrees of acceptability, and these may vary among different subcommunities, and even individuals, who speak a given langauge; morever, all of this changes (usually slowly) over time. For example, "yeah" is listed as a word in some dictionaries but not in others; perhaps it will gradually become more and more accepted, as have many other words over the course of time. At the level of syntax, English as spoken in black Harlem (a neighborhood of New York City) differs in some significant ways from "standard English" (if there is such a thing), in its lexicon, its syntax, and its pronunciation; this dialect has been studied under the name "Black English" by William Labov and others, who have shown that in some ways it is more coherent than so called "standard English".

It may be interesting to know that the first (known) grammar was given by Panini for Sanskrit, an sacred language of ancient India, more than 2,500 years ago. This work included very sophisticated components for phonetics, lexicon, and syntax, and was in many ways similar to modern context free grammars. The motivation for this work was to preserve the ancient sacred texts of Hinduism.

If we let R be the set of regular expressions over a finite alphabet A, then the "meaning" or denotation for R is given by a function

   [[_]] : R  ->  2 ** A* , 
where ** indicates exponentiation (sorry about that - HTML is lousy for formulae), so that 2 ** X indicates the set of all subsets of X, and A* is the set of all finite strings over A. Notice that the semantics given by Stansifer for regular expressions in compositional, in the sense that the denotation of each part is computed from the denotations of its part.

Although there are clever ways that can give a compositional semantics the property that the meaning of a part depends on the context within which it occurs, in the sense of the other parts of which it is a part, a mathematical semantics for a programming language is not going to give us what we usually intuitively think of as the meaning of programs written in it. For example, the fact that a certain FORTRAN program computes a certain formula does not tell us that it gives the estimated yield of a certain variety of corn under various weather conditions, let alone that this figure is only accurate under certain assumptions about the soil, and even then it is only accurate to within about 5 percent. Yet all this is an important part of the meaning of the program.

2.3 Stansifer's exposition of attribute grammars can seem pretty difficult to understand. However, the special case of grammars with just synthesized attributes is much easier; it can be explained without too much notation, and it can also be illustrated rather simply. For instance, here is a context free grammar for a simple class of expressions:

   S -> E
   E -> 0
   E -> 1
   E -> (E + E)
   E -> (E * E) 
where N = {S, E} and T = {0, 1, (, ), +, *}. We can find the value of any such expression by using an attribute grammar with just one (synthesized) attribute, val, by associating the following equations with the above rules:
   S.val = E.val
   E.val = 0
   E.val = 1
   E1.val = E2.val + E3.val
   E1.val = E2.val * E3.val 
Then a parse tree for the expression (1 + 1) * (1 + 0) looks as follows
              S
              |
              E
           /  |  \
          /   *   \
         /         \
        E           E
      / | \       / | \
     E  +  E     E  +  E
     |     |     |     |
     1     1     1     0  
where for simplicity the parentheses are left out. Then the synthesized attribute will percolate up the tree, by applying the equations from the bottom up, producing the values that are shown here
              2
              |
              2
           /  |  \
          /   *   \
         /         \
        2           1
      / | \       / | \
     1  +  1     1  +  0  
where we see that the final value is 2.

For another illustration, we can do the binary digits example in section 2.3.1 of Stansifer in a simpler way. The grammar here is

   B -> D
   B -> DB
   D -> 0
   D -> 1 
and the equations associated with these four rules are
   B.pos = 0              B.val  = D.val
   B1.pos = B2.pos + 1    B1.val = D.val *(2 ** B1.pos) + B2.val
   D.val = 0
   D.val = 1 
It is now a good exercise to compute the value of the binary number 1010 by first writing its parse tree and then computing how the values of the two (synthesized) attributes perculate up the tree.

We will see later than the parse trees of a context grammar form a very nice little algebra, such that the values of synthesized attributes are given by a (unique) homomorphism into another algebra of all possible values.

There is a slight inconsistency in Stansifer about whether In(S) should be empty; my preference is that it need not be, because S can occur on the right side of rules, as in S -> ASA on page 55. Stansifer is also not very forthright about the evaluation of attributes; the fact is that it is possible for a definition to be inconsistent, so that some (or even all) attributes do not have values; however, it is rather complex to give a precise definition for consistency. I also note that the diagrams for attribute evaluation are much more effective if they are drawn in real time using several colors; there are many other cases where live presentation works much better than reading a book; this should be good motiviation for your coming to class!

2.4 The first two occurrences of the word "list" in section 2.4.1 (page 62) should be replaced by "set". Stansifer uses the phrase "tally notation" the notation that represents 0, 1, 2, 3, ... by the empty string, |, ||, |||, ..., but it is a variant of what we will later call Peano notion. There is a typo on page 64, where it says that || + || = |||, i.e., 2 + 2 = 3! For future reference, there is an OBJ version of the Post system for the propositional calculus. Also, the production

     xax
     ---
     xxx  
on page 65 is called an "axiom" but it isn't.

The "proof" on page 68 does not really deserve to be called a proof, because it only sketches one direction and it completely omits the other direction, which turns out to be much harder than what is sketched. It is remarkable that the term "theorem" appears at different three levels: (1) a theorem of the predicate calculus, i.e., some x for which Th x is provable; (2) a theorem of the Post system for the predicate calculus, which means a derivable term of that system, which includes some terms of the form Th x, others of the form P x, etc.; and (3) a theorem of mathematics, the proof of which is discussed in the previous sentence.

2.5 I don't know why Stansifer is so dismissive about issues of concrete syntax in this section; he should be pleased that he did such a good job discussing them, and motivating various formalisms earlier in the chapter.

In algebraic terms, destructors are left inverses of constructors. For example,

  FirstOfBlock(Block(W1, W2))  =  W1
  SecondOfBlock(Block(W1, W2)) =  W2

4.3 The punch line on page 119, about motivation for abstract types is especially important, and could be taken to refer to much of the other material in this section.

4.5.1 The classification of the different kinds of polymorphism is due to Christopher Strachey of Oxford University, in lectures notes from 1967. This is the same Strachey who introduced the language CPL which was implemented as BCPL and which inspired C (the "C" is for Christopher); he also introduced the very useful notions of "l-value" and "r-value" in these same lecture notes; and he is the co-founder with Dana Scott of denotational semantics.

4.5.2 Ada generics were designed by Bernd Krieg-Bruckner, who is one of the leading European researchers on algebraic specification, now at the University of Bremen, based on OBJ parameterized modules, though only part of the OBJ functionality is included in Ada; in fact, the extra power of OBJ modules avoids the difficulty with Ada described on page 126, through the use of its interface theories.

4.5.3 There is a small bug on page 129, since the identifier l should be associated with the type `a, not bool. More importantly there is a bug in Theorem 3 on page 130, since the type variable tau is "floating" in the way the result is stated; the easiest way to fix it is to insert "some" before the first instance of tau.

5.2 The material on pages 152-153 is a link to the compiler class; the use of offsets, a symbol table, etc. is good engineering, which makes a very efficient runtime environment possible.

5.3 Hope (not HOPE) was a clever functional programming language designed by Rod Burstall, named after Hope Park Square, where his office was located in Edinburgh (Hope Park Square was named after a civil engineer who had a large impact on that area of Edinburgh). Its pattern matching feature, which led to that of ML, was in part inspired by discussions between Burstall and Goguen about OBJ, which was also under development during the same period. I hope you noticed the similarity between pattern matching in ML and in OBJ. Moreover, the "pattern matching" capability (if we want to call it that) of OBJ3 is much more powerful than that of ML, in particular, allowing matching modulo associativity and/or commutativity. For example, if we define

   op _+_ : Nat Nat -> Nat [comm] .
   vars M N : Nat .
   eq M + 0   = M .
   eq M + s N = s(M + N). 
then the first equation can match 0 + s s 0 and the second can match (s x) + y , returning s s 0 and s(x + y) , respectively. Matching modulo associativity and commutativity can yield even more dramatic differences from ML pattern matching.

8.1 Please look at Stansifer's list on page 270 of reasons to study semantics. Of course, algebraic denotational semantics adds to this list the possibilities of testing programs and verifying programs; you should compare this list with the corresponding list on page 1 of Algebraic Semantics. Stansifer's characterization of algebraic semantics on page 271 was written before the book Algebraic Semantics came out, and hence only refers to basic initial algebra semantics, not the more powerful combination of loose and initial semantics that we are using here.

8.2 In a certain sense, denotational semantics is a special case of basic initial algebra semantics. The term algebra T over the signature S of a programming language P has all the well formed syntactic units of P in its carriers, and is an initial S-algebra. If A is an appropriate S-algebra of denotations, then the unique S-homomorphism [[_]] : T -> A is the denotation function, and the various equations that express the homomorphism property are exactly the equations that are usually written down in denotational semantic definitions.

8.3 For example, the signature S of the decimal numeral example on pages 272-273 looks as follows in OBJ3:

  subsort D < N .
  ops 0 1 2 3 4 5 6 7 8 9 : -> D .
  op _ _ : N D -> N . 
If we now let W denote the S-algebra with its D carrier containing the numbers from 0 to 9, with its N carrier containing all the natural numbers, and with _ _ on W defined to send n,d to 10*n+d, then the unique S-homomorphism from T to W indeed gives the natural number [[n]] denoted by the numeral n. All of the other examples in Stansifer can be seen in a similar light, though of course the semantic algebra gets more and more complex. The homomorphism equations express what is often called compositionality, which says that the meaning of the whole can be computed from the meanings of its parts. For the above example, the equation is
     [[n d]] = 10*[[n]] + d.

8.8 Classical denotational semantics made the decision that all denotations must be functions, usually higher order, but of course zeroth order functions are included, which are just constants. For example the denotation of a program in a simple langauge might be a function from States to States. So this seems like a natural choice; however, some very difficult mathematical problems arise when trying to give denotations to recursive functions. Stansifer notes on page 282 that set theory is no longer adequate; in fact, so called domain theory must be used, which can get very complex indeed, since it solves the problem of denotations for fixpoint functions in the lambda calculus.

Although this was a great advance for logic, it seems to me to have been a step backwards for computer science, where texts are traditionally used to denote procedures, e.g., in compilers and in runtime environments. In fact, this is (in effect) just what algebraic denotational semantics does, and the result is a vastly simpler treatment of recursive procedures, which also has the additional merit that it directly supports executing and verifying procedures, both of which are extremely difficult in a purely denotational setting.

9.2 On page 307, Stansifer gives a simple example of how the treatment of variables in Hoare logic can lead to unfortunate results (namely, an obviously wrong program can be proved correct!). Actually, Stansifer does not seem aware that the problem is with the treatment of variables in this version of Hoare logic, and instead says it is due to the definition of partial correctness; but he is wrong. Algebraic denotational semantics, by treating different kinds of variables differently, allows a notion of partial correctness where such silly trivially wrong programs cannot be proved correct.

9.3 As noted in Algebraic Semantics, weakest preconditions do not work correctly for specifications written in first order logic; you must use infinitary logic (which is the logic of infinitely long expressions!) or second order logic, and as a result things get much more complicated (see p. 309). Also, Theorem 20 (p. 311) is not stated correctly: only relative completeness holds, i.e., completeness assuming an oracle for theorems of arithmetic. (Roughly speaking, the problem is that arithmetic is undecidable (by a famous theorem of Goedel), and arbitrarily difficult theorems of arithmetic may be needed in proving programs correct, but Hoare logic does not provide any way to get theorems about arithmetic.)


B. Notes on Algebraic Semantics

0. One of the most important distinctions in programming languages is that between syntax and semantics. While BNF does a pretty good job with syntax, it remains very difficult to understand programs and programming languages. Nevertheless, some progress has been made, and today denotational semantics is widely considered to be the best approach for giving meaning to programming languages, and hence to programs. The book Algebraic Semantics of Imperative Languages develops a variant of denotational semantics that is based on algebra, and in particular, on a kind of equational logic, which is actually implemented in the OBJ3 language. The book demonstrates that its approach is adequate for a wide range of programming features, including arrays, various kinds of "calls" for variables, and various kinds of procedures; although this is impressive, it would still be difficult to define a real programming language like Ada.

The main idea of a denotational semantics for a language is to provide denotations for each kind of phrase in the language (such as variable, expression, statement, and procedure), and also to provide a systematic way to combine the denotations of the constituents of a larger phrase to get its denotation.

Formal semantics is an important branch of formal methods, an area of computer science that is difficult, but of growing importance, concerned with the semantic correctness of systems; it is currently considered to be cost-effecitve for safety critical systems. All major chip manufacturers now have formal methods groups, motivated in part by the huge cost of call-backs if errors are found (as with the notorious Itel Pentium 5 arithmetic error). NASA has a formal methods group, motivated by the many software failures that have plagued aerospace efforts (such as the Ariane 5 rocket failure), and the difficulty of communication with distant unmanned spacecraft. Manufacturers of medical equipment are also considering, and in some cases using, formal methods, motivated by the cost of lawsuits if faulty equipment causes death or injury. Similar concerns arise in the nuclear power industry, the military, and many other areas.

As argued in some detail in the Preliminary Essay on Comparative Programming Linguistics, the best way to appreciate a language is to understand how it is intended to be used. OBJ was not designed as a language for writing programs, but rather as a language for writing specifications, and in particular, for writing semantics for programming languages. Once this is clear, many of its unusual design choices can be appreciated, including its mixfix syntax and subsort polymorphism, its use of algebra, and its term rewriting capability. In particular, signatures provide a meta-syntax that can be used to define the syntax of a programming language, and we will see that equations and algebras can be used to define the semantics of a programming language,

1.1 To be a bit more precise, an OBJ3 signature can be considered a variant notation for a context free grammar, with some additions like precedence and subsort polymorphism, and more importantly, with an implementation, which is the very general parsing mechanism in OBJ3; this provides a general and powerful way to handle syntax (we will later see that OBJ also provides powerful ways to handle semantics). The sorts of a signature correspond to the non-terminals of a grammar. OBJ3's subsort polymorphism provides a highly consistent treatment of the most common kinds of coercion, in a way that also supports the common kind of overloaded operations. This is in contrast to the coercion mess found in many programming languages, and it is very convenient for defining denotations.

The prefix notation defined in NAT is Peano notation, of which Stansifer's "tally notation" is a postfix variant; some example terms appear on pages 18 and 19 (though strictly speaking these use a larger signature). There is a bug in Figure 1.2 on page 14: the sort should be Exp instead of Nat.

1.2 A semantics for a signature is given by an algebra A for that signature. Such an algebra gives a denotation for each sort s, which is the set As of elements of A of sort s, called the carrier of A, and similarly, the denotations of the operation symbols in the signature are functions among appropriate carriers of A.

1.3 The operations in the term algebra of a signature are exactly the constructors (in the sense of Stansifer in Section 2.5) for the abstract syntax of the context free language defined by that signature. Moreover, the carrier of sort s of the term algebra consists of exactly the abstract syntax trees (expressed as terms) for the grammar of that sort. Neat!

To be more precise now, suppose G = (N,T,P,S) is a context free grammar. Then the signature for G, denoted SigmaG, has as its sort set N, the non-terminals of G, with operations derived from the productions of G as follows: if

          p: N -> w1 N1 w2 N2 ... wn Nn wn+1

is a production in P with each Ni a non-terminal, and each wi a string of terminals, then the corresponding operation is

          p: N1 N2 ... Nn -> N

and the SigmaG-term algebra is exactly the algebra of abstract syntax terms (or trees) for G.

1.4 The notion of assignment in Definition 9 is essentially the same as in Stansifer on page 67, but much more general.

1.6 In Definition 14 on page 30, on the first line, (TSigmaU Xi)s should instead be (TSigma U Xi)s.

The rule [SM] on page 31 should be

     (s X)* Y = (X * Y)+ Y . 

1.6.3 The last line of the definition of NATEXPEQ on page 39 should be

     eq (s X)* Y = (X * Y)+ Y . 
(Thanks to Bob Boyer, CSE 230 W'01.) Moreover, the same typo occurs in the rule [SM] on page 31.

The "Theorem of Constants" is really a justification for the universal quantifier elimination rule:

To prove an equation (forall X)e over a signature Sig, it suffices to prove e over Sig(X).
Or more precisely and more generally,
A |-Sig(X) P    implies    A |-Sig (forall X) P
where P is a first order sentence with equations as atoms, A is some set of first order axioms, and |-Sig indicates provability over signature Sig.

There is, however, an important caveat regarding use of the disequality predicate (see Section 2.1.1). For example, suppose we are trying to prove a first order formula (forall X,Y) P(X,Y) over Sig, and use the Theorem of Constants to reduce it to proving P(x,y) over Sig(x,y). Because x=/=y (since they are distinct constants), what we have really proved is

(forall X,Y) X=/=Y implies P(X,Y) .
However, if the proof of P(x,y) never makes use of x=/=y, then we have actually proved (forall X,Y) P(X,Y). [You can check whether OBJ3 ever uses x=/=y by turning on trace, saving the output in a file, and then searching it using a good editor.]

But what should be done if the proof does use x=/=y? We can complete the proof by proving P(x,x), which then gives (forall X) P(X,X). This can be justified by considering the two proofs as parts of a case analysis, where the two cases are X=/=Y and X=Y.

2.1 The programming language studied in this book is so simple that it suffices to use just stores as states of the run-time system; we do not need the extra power of having environments in addition to stores, as in Section 3.3.1 of Stansifer. Unfortunately Stansifer's terminology clashes with ours, since Stansifer uses the term "state" for what I would rather call a "store," which is a map from locations to values (I would reserve "state" for the whole thing, which may include both environments and stores); moreover, Stansifer also uses the term "environment" for what I call "store" in the case where the is no "environment" (that is, a map from identifiers to values). However, this situation should not any cause confusion in discussing our semantics; all these terms can be used synonymously because there is only one thing that they could refer to anyway.

2.1.1 You can skim the technical discussion of the equality and disequality operations in this subsection, because the details are not needed until later; but you should read the notes for Section 1.6.3 above.

2.2 The semantics that we begin to define here is enormously more simple than that given in Chapter 9 of Stansifer (I hope you agree!), and has the additional advantage that it directly supports mechanical correctness proofs using OBJ.

3. The one thing that I would most emphasize about this chapter is that all the semantic definitions in it are absolutely natural and absolutely simple, in the sense that there really isn't anything else you could write. The only exception to this is the use of EStore instead of just Store, which seems artificial at this point, because in is really not needed for the constructions given in Chapter 3; in fact, Proposition 27 can be seen as proving that EStore isn't needed here. However, as the chapter repeatedly emphasizes, EStore becomes absolutely necessary when programs can have while loops (since these may not terminate), and so we may as well write the definitions in the way we will eventually need them anyway.

3.3 The proof of Proposition 27 is not difficult, and the result is intuitively obvious. However, it is a bit technical to give a precise definition for "structural induction" and to justify it; also, the formulation of program terminatation in Proposition 27 is a bit technical, and requires some thinking to be understood.

5.1 When proving the (partial) correctness of a loop, the invariant appears both as an assumption (on entering the loop) and as a goal. This means that it must be treated in two completely different ways. We illustrate these different treatments by working with a formula F of the form

      (forall Q(X)) P1(X) and P2(X) 
where Q(X) is something like 1 < X < N. This formula is really an abbreviation for an implication, of the form
      (forall X) Q(X) implies P1(X) and P2(X).
If P1 and P2 are both equations, then in assuming this formula, we introduce two conditional equations,
      cq t1(X) = t1'(X) if 1 < X and X < N .
      cq t2(X) = t2'(X) if 1 < X and X < N .  
On the other hand, in trying to prove the formula, we would first eliminate the quantifier, then eliminate the implication, and finally eliminate the conjunction, so that the setup would be something like the following:
      op x : Int .
      eq 1 < x = true .
      eq x < n = true .
      red t1(x) == t1'(x).
      red t2(x) == t2'(x).
Of course, things are more complex for an invariant, because of taking account of the state, the precondition, etc.

6. Here for the first time, we see some non-trivial algorithms and proofs, and it is interesting to consider what we can learn from this encounter. Please note that it is not claimed that this is the best way to program, or to develop algorithms (although many books on formal methods do make claims of this kind). Second, please note that the proofs in this section are neither completely formal nor completely informal; the intention is to develop a middle way between the enormously detailed tedium of fully formalized proofs and the error-ridden clarity of informal proofs, such that OBJ can deal with the complexities of the programming language semantics, and the user (i.e., you) can deal with the structure of the proof. Third, please note that it is expedient to do an informal proof first, or at least in parallel, rather than to try an OBJ proof without knowing what it should look like; OBJ can help you check whether your informal proof plan actually works, and it can help you to carry out and debug that plan, but it cannot produce a proof plan by itself.

Because OBJ only does reduction, and is not a theorem prover for first order logic, you will encounter difficulties and details that are normally hidden when proofs are done by hand; some people hate this, but my viewpoint is that this is a very interesting phenomenon! Who would have guessed that doing "simple" proofs would involve such work? This is related to the failure of the classical methods of Artificial Intelligence to conquer software development, or indeed, any very difficult domain, and also to the failure of Hilbert's program to formalize mathematics. Most people have no idea what the difficulties with these projects actually were, but those who work through this section will.

7.1.2 The procedure swap(X,Y) actually is correct when X=Y, but this is not proved by the proof score in the book. But you can handle that case by doing the corresponding things for swap(x,x), where x is a new constant of sort Var. However, we should not expect that in general parameterized procedures will work correctly when some of their arguments are equal.

B. Well founded induction does require proving P(0): if we let x=0, then we get the implication true => P(0), which is equivalent to P(0), as our proof obligation for this case.


C. Miscellaneous Topics

C.1 Lambda Calculus

The reading on the lambda calculus in OBJ gives a fully formal definition for the syntax and operational semantics of the lambda calculus, along with numerous examples, including (among other things) the following:

  1. showing that the lambda calculus is non-terminating, by giving a specific calculation that doesn't terminate;
  2. showing that alpha renaming is sometimes required in order for beta reduction to give the result it should;
  3. some combinators and an indication of how to prove combinator identities (though this can also be done more directly without using the lambda calculus);
  4. (the beginnings of the demonstration) that logic, arithmetic, and list processing can be done with just lambda expressions (to me, it seems amazing that this is possible!).
This presentation of the lambda calculus has some unusual features, including explicit runtime error messages and a slightly more readable syntax. It is recommended that you play with this OBJ specification yourself, because this will give you a much better feeling for the lambda calculus than just reading about it. For example, you could replace the "[_ _]" syntax for application by the traditional "_ _" notation and see how things go; and you can make its parsing associate to the left with the attribute "[gather (E e)]". It is also worth noting that historically term rewriting arose as an abstraction of the lambda calculus; for this reason, it is very natural to use it to describe the lambda calculus.

C.2 Internet Languages

We will now examine some of the new languages that have been spawned by the recent explosion of interest in the internet. Among these, the currently most important may be Java, HTML, JavaScript, Perl, and XML. One interesting observation about these languages is that they differ greatly from the classical programming languages that are traditionally studied in courses like CSE 130 and 230, of course because they serve different purposes.

Let's start with Java. Probably security issues have been addressed to a greater extent in Java than in any other programming language, and many unusual design decisions are due to security concerns. However, platform-independence and portability were perhaps the major forces driving the design of Java, and it is these that motivate the unusual decision to implement it using interpretation on an abstract machine. The concerns with security and portability are of course motivated by the use of the language on the internet, as is the use of threads for (psuedo-)concurrent execution. The use of APIs allows portability without sacrificing functionality, and in particular provides extensive support for interactive graphics.

The "ML" in HTML is for "Markup Language," and HTML is of course not a programming language, but a language for describing multimedia content, originally in a way that is independent of the display device to be used, though later evolution of the language introduced many features that allow graphic designers to produce more pleasing layout for specific browsers. It would be interesting to survey all the effects that commercial competition had on HTML, but let it suffice to note that both MicroSoft and Netscape introduced non-standard features in an attempt to lock-in customers.

Although HTML is not a programming language, some programming language features are often desirable in writing content for display on web pages. For example, one wants simple procedures for buttons, menus, etc., rather than having to code them up from scratch. Sometimes one also wants functionality such as counting the number of mouse clicks, where simple programming language features would come in handy. JavaScript is a low power programming language designed for just such purposes; it is relatively simple, but has a lot of "widget" procedures to support interactive graphics. One would not want to use JavaScript for general purpose programming, e.g., for writing a compiler.

Perl is a language that fills a small but important niche in the internet world; it has many features that make it unsuitable for general purpose programming, such as being untyped and having weak modularity. But it is ideal for quickly writing relatively small translators, for example, into SQL, and it has been called "the duct tape of the intnet." It is also notable that Perl is an open source effort, and has very high quality implementations and documentation. See Perl: The first postmodern computer language, by Ed Wall, the designer of Perl, for an amusing discussion.

Finally, XML serves as a kind of meta-language for HTML (though the "ML" in its name is still officially for "Markup Language" not "Meta Language," and the "X" is for "extensible"). Like HTML, XML is simplified from SGML, but unlike HTML, it enables users to define their own new tags. The impetus for developing this languages comes primarily from B2B applications, where it is expected to be used very extensively. However, it is also of interest for applications in the sciences, and of course in computer science. In fact, we have used it in the Kumo system being developed in my own lab. (This system also uses HTML and JavaScript, of course.)

I would now recommend re-reading the Preliminary Essay on Comparative Programming Linguistics, for its discussion of how intent and social context affect design.


D. Notes on Ullman's ML Text

2 Here are some questions that may aid you in reading Chapter 2: Why does ML have all of tuples, strings, and lists? Why does ML have explicit coercions? Why do binary functions take tuples as arguments? In my opinion, what is remarkable about ML is that these questions (and many others of a similar nature) have good answers, because the language is exceptionally well designed; for example, similar questions about C do not really have good answers.

2.3.3 The box on page 31 seems to claim that ML's val declaration is side-effect free, but this is arguable, and I am more inclined to disagree than to agree, although a case for the other side can of course also be made. Consider the following ML code:

    val t = 3;
    val t = (t,t);
    val u = (t,t);
    val t = (t,t);
    val u = (t,t); 
Certainly we get very different values for t and u, depending on what "assignments" have been done previously. By the way, ML also has a "real" assignment statement that no one would argue is side-effect free, so ML is definitely not a pure functional language, although it does have a very nice functional sublanguage.

3.2.1 The discussion of what Ullman calls "environments" (and I would call stacks of environments) is easy to follow, but leaves out some extra details needed for the imperative features of ML; Stansifer, Section 5.2, has more detail, more precision, and more generality. Ullman might have mentioned that these clever ideas come to ML from Lisp, and are modified forms of clever ideas for implementing Algol 60, that arose in IFIP WG 2.1. (Stansifer might also have mentioned the role of WG 2.1 in his discussion of block structure in Chapter 5.)

3.3 Ullman is not very good on history; for some historical information on ML pattern matching, see the discussion of section 5.3 of Stansifer above.

5.5 To curry a function is a certain way to get an equivalent function with domain a function type instead of a product type. For example, given f of type Int * Int -> Int, we can define the equivalent function f' of type Int -> (Int -> Int) by

    fun f' m n = f(m,n); 
Thus, f'(6) is a function of type Int -> Int. There is a nice mathematical expression for currying, which also include its converse, called uncurrying, given by the following isomorphism
   [(T1 * T2) -> T] ~ [[T1 -> [T2 -> T]] 
where [T] indicates the set of functions having type T, and ~ indicates isomorphism.

5.6.3 The following definition for map is more idiomatic ML than the one given by Ullman on p. 177 using let, although Ullman's version does have some expository value.

   fun map F nil = nil
     | map F (x :: xs) = F x :: map F xs;  
Similarly, the definition of comp in the box on p. 177 is more idiomatic than Ullman's version using let in Figure 5.20 on p. 176. (These alternative definitions are more idiomatic because they make better use of the capabilities of ML.)


To CSE 230 homepage
Maintained by Joseph Goguen
© 2000, 2001, 2002 Joseph Goguen
Last modified: Thu Feb 21 14:44:02 PST 2002