CSE 130: Principles of Programming Languages
Notes on Chapter 3 for Sethi (assignment, control, and invariants)

3.1 The assignment statement is the essence of imperative programming, but it is relatively little discussed in Part II of Sethi; for this reason, these notes will discuss it in some detail.

Both Turing and von Neumann used assertions and invariants (which are the two main ideas) to prove correctness of flow chart programs, which they independently introduced in the early 1940s. Turing also designed the first stored program electronic computer, although it was later redesigned and built by others in the UK. This information was secret until rather recently, because Turing's work was an important part of allied World War II efforts to break German and Japanese secret codes for war messages. (There are several good books, and even a good play on Alan Turing's very interesting but rather sad life; the author of one of those books, Andrew Hodges, maintains a large website devoted to the life of Turing, which includes a page on the play.)

3.2 Advocates of so called structured programming railed against the infamous GOTO statement, claiming that it produced "spaghetti code," i.e., code that is full of seemingly random, twisted links. (By the way, many today feel that unrestricted inheritance in object oriented programming is just as bad, especially in large systems that use dynamic binding and multiple inheritance.) The value of conditional and loop statements is that they provide ways to replace GOTO's with code that can reveal the programmer's actual intention.

Sethi does a very nice job of defining the notion of structure in structured programming, as involving single entry and single exit syntactic constructions. But there is another factor making these constructions easier to read, which is that the keywords explicit express the intent of the programmer, such as doing a branch, loop, or case analysis.

A theme throughout our discussion of this chapter will be the drastic difference between an assignment and an equality. Equalities should be denoted by symmetrical symbols, such as the usual "=" symbol, because relations of equality are symmetrical. However, assignment, for which we will usually use the notation ":=", is far from symmetrical. For example, the following statements have opposite meanings:

    X := Y
    Y := X 
It is interesting to look at the following list of notations that have been used for assignment in various languages:
    A := 3
    A  = 3
    A <- 3
    A =. 3
    3 -> A
    MOVE 3 TO A
   (SETQ A 3)
The only symmetrical symbol is "=". The first is used by Pascal, Ada, Icon, ML, module-3, ALGOL 68, C++, and Java, the second by FORTRAN, PL/I, SNOBOL4, and C, the third by J, the fourth by BETA, the fifth by COBOL, and the last by LISP. This list is certainly not complete. By contrast, there is no assignment statement in functional langauges, for example, of the OBJ family, where the equality symbol really does denote equality.

If X is an assignable variable in ML, and one writes

    X := X + 1  
then one might well expect that X would get assigned 1 more than the reference to the cell where X is stored (but in fact, and more usefully, one gets a type error). What one should write to get this effect is
    X := !X + 1  
where !X represents the dereference of X, that is, the value stored in X. Since the core of ML is functional, the ML approach to assignment is the result of (reluctant) thinking about imperative programming by functional programmers; perhaps it is intended to discourage you from using assignments, and certainly it is intended to make you think about what you are doing if you do use them. Requiring dereferencing to be made explicit certainly has this effect; and one must also go to a little extra trouble to declare an assignable variable, which must be reference-valued.

Perl has a version of multiple assignment for arrays, as illustrated in the following:

perl -e '$a=1; $b=2; ($a,$b)=($b,$a); print "$a $b\n"' which you can try on a unix machine; it prints 2 1 . (Thanks to Dana Dahlstrom, in CSE 230 W2002, for this.)

The subtlety of the assignment statement becomes much clearer when one attempts to provide a denotational semantics for it (see Chapter 13 of Sethi for more detail). For (all but the most primitive) imperative languages, such a semantics must carefully distinguish among the name of a cell, the location of a cell, and the content of a cell, and must provide ways of passing locations and contents as values. Identifiers are used as the names of cells, and these are usually called "variables," although strictly speaking the concept of a programming language "variable" ought to include all three aspects mentioned above.

There is a very small bug in Figure 3.2 on page 68, a period after then final end, so that end. should have been just end.

Also there is something odd about the discussion of the case statement, in that step is mentioned, but there is no syntax to specify its value. The default value is of course 1, but some languages have explicit syntax, such as step 3 to set the value to something other than 1.

3.3 Christopher Strachey introduced the very useful notions of the l-value and r-value of a variable. Given an an assignment statement, the "l-value" of a variable on its left side is a reference to a cell, while "r-value" refers to the values of variables on the right side, which are the contents of their cell. This will be made more precise later on. (Recall that Strachey designed the language CPL, which was implemented as BCPL, and inspired C.) It is better to call the container that holds the value of a programming variable a cell than to call it a location, and we will usually do so in class and in these notes. The term "location" should refer to a reference to a cell, rather than to the cell itself.

The C statement

     i=j=k=l  
is a particularly good illustration of bad programming language syntax. Assuming you know that "=" is assignment in C (which itself is a bad choice), and you also know that assignment statements have r-values in C, there are still three different ways to parse the above line, so you must also know that C's parse conventions give
     i=(j=(k=l))  
i.e., that = is right associative in C. It then follows that the variables i,j,k,l all end up with the same r-value, the one that was originally attached to l. Amazingly, the other parses also work, at least in some implementations of C, though this is not well documented; they work because an assignment not only has an r-value, but also has an l-value. The statement
     (i=j)=k=l  
is parsed as
     (i=j)=(k=l)  
and therefore the variable i will first be assigned the r-value of j, and then that of l, which k will also get, although j will retain its original r-value. As an exercise, you should work out what happens under the third parse,
     ((i=j)=k)=l .

3.4 The COME FROM program below is a rare example of a good joke written in the form of a program; it serves to emphasize the problems with GOTO's, by exaggerating them:

     10  J = 1
     11  COME FROM 20
     12  WRITE (6, 40) J
         STOP
     13  COME FROM 10
     20  J = J+2
     40  FORMAT (I4)  
After assigning 1 to J at line 10, control passes to line 20, because of the COME FROM at line 13, and then to line 12 because of the COME FROM at line 11. So the value of J (which is 3) is written and the program stops. (The COME FROM construct and above program are due to R.L. Clark, CACM 27, pp 394-395, 1984.)

Flow diagrams were used by both Turing and von Neumann in the 1940s as an aid to writing programs. A long 1947 paper by Goldstein and von Neumann on programming recommends first drawing a flow diagram, then writing psuedo-code (the form of which later became called assembly code), and then hand translating into machine code.

There are arguments against pointers that are essentially the same as those against the GOTO; these can be summarized by noting that code written with pointers fails to indicate the programmer's intent in the same way as does code written with GOTO's, and it gives "spaghetti storage" as a result. The first line of defense against this is also the same as that for GOTO's, namely to provide good programming language support for replacing the most common uses of the feature, which for pointers is generally recursive data types. ML does a very good job of this, while C only gives the appearance of providing such a capability.

3.5, 3.6 It is difficult to give a good explanation for the assignment statement proof rule, but Sethi's is quite good, perhaps the best I have seen.


To CSE 130 homepage
To CSE 130 notes page
Maintained by Joseph Goguen
© 2000 - 2004 Joseph Goguen
Last modified: Thu Feb 5 21:19:08 PST 2004