CSE 130: Principles of Programming Languages
Notes on Chapter 3 for Sethi (assignment, control, and invariants)

3.1 The assignment statement is the essence of imperative programming, but it is relatively little discussed in Part II of Sethi; for this reason, these notes will discuss it in some detail. The best key to understanding the meaning of assignment is Strachey's notions of l-value and r-value, which are discussed in section 3.3 below.

Both Turing and von Neumann used assertions and invariants (which are the two main ideas) to prove correctness of flow chart programs, which they independently introduced in the early 1940s. Turing also designed the first stored program electronic computer, although it was later redesigned and built by others in the UK. This information was secret for a long time, because Turing's work was an important part of allied World War II efforts to break German and Japanese secret codes for war messages. (There are several good books, and even a good play on Alan Turing's very interesting but rather sad life; the author of one of those books, Andrew Hodges, maintains a large website devoted to the life of Turing, which includes a page on the play.)

3.2 Advocates of so called structured programming railed against the infamous GOTO statement, claiming that it produced "spaghetti code," i.e., code that is full of seemingly random, twisted links. (By the way, many today feel that unrestricted inheritance in object oriented programming is just as bad, especially in large systems that use dynamic binding and multiple inheritance.) The value of conditional and loop statements is that they provide ways to replace GOTO's with code that can reveal (aspects of) the programmer's actual intention.

Sethi does a very nice job of defining the notion of structure in structured programming, as involving single entry and single exit syntactic constructions. But there is a subtle point: the single exit from a structure may be taken from more than one point inside the structure, for example, if there is a return statement. There is also another factor that makes these constructions easier to read, which is that the keywords explicitly express the programmer's intention, such as doing a loop, or a case analysis, or a conditional.

A major theme throughout our discussion of this chapter will be the drastic difference between an assignment and an equality. Equalities should be denoted by symmetrical symbols, such as the usual "=" symbol, because relations of equality are symmetrical. However, assignment, for which we will usually use the notation ":=", is far from symmetrical. For example, the following two statements have opposite meanings:

    X := Y
    Y := X 
It is interesting to look at the following list of notations that have been used for assignment in various languages:
    A := 3
    A  = 3
    A <- 3
    A =. 3
    3 -> A
    MOVE 3 TO A
   (SETQ A 3)
The first is used by Pascal, Ada, Icon, ML, module-3, and ALGOL 68, the second by FORTRAN, PL/I, SNOBOL4, C, C++, and Java, the third by J, the fourth by BETA, the fifth by COBOL, and the last by LISP. This list is certainly not complete. The only symmetrical symbol here is "=". By contrast, there is no assignment statement in functional langauges, for example, of the OBJ family, where the equality symbol really does denote equality.

Perl has a version of multiple assignment for arrays, as illustrated in the following:

perl -e '$a=1; $b=2; ($a,$b)=($b,$a); print "$a $b\n"' which you can try on a unix machine; it prints 2 1 . (Thanks to Dana Dahlstrom, in CSE 230 W2002, for this.)

The subtlety of the assignment statement becomes much clearer when one attempts to provide a denotational semantics for it (see Chapter 13 of Sethi for more detail). For (all but the most primitive) imperative languages, such a semantics must carefully distinguish among the name of a cell, the location of a cell, and the content of a cell, and must provide ways of passing locations and contents as values. Identifiers are used as the names of cells, and these are usually called "variables," but really the concept of a programming language "variable" should include all three aspects mentioned above. It is very different from the notion of variable in mathematics!

There are several small bug in this section. On page 66, "leap :=" should have come before the expression defining leap years (by the way, this is a very nice example!). On page 67, line 8, "remain 0" should be "remain =/= 0" (where =/= is "unequal). In Figure 3.2 on page 68, a period after then final end; i.e., it should have been just "end" instead of "end.". The exposition of definite iteration on page 69 may be confusing, because step is not actually used in the Pascal syntax.

Also there is something odd about the discussion of the case statement, in that step is mentioned, but there is no syntax to specify its value. The default value is of course 1, but some languages have explicit syntax, such as step 3 to set the value to something other than 1.

3.3 Christopher Strachey introduced the very useful notions of the l-value and r-value of a variable. Given an assignment statement, the "l-value" of a variable (on the left side) is a reference to a cell, while "r-value" refers to the value of a variable (on the right side); these values are the contents of the cells of the variables. This will be made more precise later on. It is better to call the container that holds the value of a programming variable a cell than to call it a location, and we will usually do so in class and in these notes. It is better for the term "location" to refer to a reference to a cell, rather than to the cell itself. (Recall that Strachey designed the language CPL, which was implemented as BCPL, and inspired C.)

The C statement

is a particularly good illustration of bad programming language syntax. Assuming you know that "=" is assignment in C (which itself is a bad choice), and you also know that assignment statements have r-values in C, there are still three different ways to parse the above line, so you must also know that C's parse conventions give
i.e., that = is right associative in C. It then follows that the variables i,j,k,l all end up with the same r-value, the one that was originally attached to l. Amazingly, the other parses also work, at least in some implementations of C, though this is not well documented; they work because an assignment not only has an r-value, but also has an l-value. The statement
is parsed as
and therefore the variable i will first be assigned the r-value of j, and then that of l, which k will also get, although j will retain its original r-value. As an exercise, you should work out what happens under the third parse,
     ((i=j)=k)=l .

3.4 The COME FROM program below is a rare example of a good joke written in the form of a program; it serves to emphasize the problems with GOTO's, by exaggerating them:

     10  J = 1
     11  COME FROM 20
     12  WRITE (6, 40) J
     13  COME FROM 10
     20  J = J+2
     40  FORMAT (I4)  
After assigning 1 to J at line 10, control passes to line 20, because of the COME FROM at line 13, and then to line 12 because of the COME FROM at line 11. So the value of J (which is 3) is written and the program stops. (The COME FROM construct and above program are due to R.L. Clark, CACM 27, pp 394-395, 1984.)

Flow diagrams were used by both Turing and von Neumann in the 1940s as an aid to writing programs. A long 1947 paper by Goldstein and von Neumann on programming recommends first drawing a flow diagram, then writing psuedo-code (the form of which later became called assembly code), and then hand translating into machine code.

There are arguments against pointers that are essentially the same as those against the GOTO; these can be summarized by noting that code written with pointers fails to indicate the programmer's intent in the same way as does code written with GOTO's, and it gives "spaghetti storage" as a result. The first line of defense against this is also the same as that for GOTO's, namely to provide good programming language support for replacing the most common uses of the feature, which for pointers is generally recursive data types. ML does a very good job of this, while C only gives the appearance of providing such a capability.

3.5 What you should get out of this section is an understanding of how to use invariants to write better code, and in particular, to avoid errors in the condition for exiting a loop. The purpose of this material is to help you become a better programmer by better understanding some of the more difficult aspects of loops. Please note that invariants are not just comments inside curly brackets! They should be precise mathematical expressions that depend only on the state (i.e., the values of variables, arrays, etc.) of the program environment at the point where the invariant occurs. The key point is that they should evaluate to true every time control is at the point in the program where the invariant is placed.

3.6 It is difficult to give a good explanation for the assignment statement proof rule, but Sethi's is quite good, perhaps the best I have seen. The fact that it is difficult to explain this rule is itself interesting, and suggests that assignment is a more subtle and difficult operation than is often realized. The examinations will not include questions on this section, but understanding the rules will help you to use invariants better, and hence to become a better programmer.

3.7 This section reviews some basics of C, which most of you probably already know.

A Note for those who know some ML
If X is an assignable variable in ML, and one writes
    X := X + 1  
then one might well expect that X would get assigned 1 more than the reference to the cell where X is stored (but in fact, and more usefully, one gets a type error). What one should write to get this effect is
    X := !X + 1  
where !X represents the dereference of X, that is, the value stored in X. Since the core of ML is functional, the ML approach to assignment is the result of (reluctant) thinking about imperative programming by functional programmers; perhaps it is intended to discourage you from using assignments, and certainly it is intended to make you think about what you are doing if you do use them. Requiring dereferencing to be made explicit certainly has this effect; and one must also go to a little extra trouble to declare an assignable variable, which must be reference-valued.

To CSE 130 homepage
To CSE 130 notes page
Maintained by Joseph Goguen
© 2000 - 2006 Joseph Goguen
Last modified: Sat Feb 4 10:14:40 PST 2006