3.1 The assignment statement is the essence of imperative programming, but it is relatively little discussed in Part II of Sethi; for this reason, these notes will discuss it in some detail.
Both Turing and von Neumann used assertions and invariants (which are the two main ideas) to prove correctness of flow chart programs, which they independently introduced in the early 1940s. Turing also designed the first stored program electronic computer, although it was later redesigned and built by others in the UK. This information was secret until rather recently, because Turing's work was an important part of allied World War II efforts to break German and Japanese secret codes for messages. (There are several good books, and even a good play on Alan Turing's very interesting but rather sad life; the author of one of those books, Andrew Hodges, maintains a website devoted to the life of Turing, which includes a page on the play.)
3.2 Advocates of so called structured programming railed against the
GOTO statement, claiming that it produced "spaghetti
code," i.e., code that is full of seemingly random, twisted links. (By
the way, many today feel that unrestricted inheritance in object oriented
programming is just as bad, especially in large systems that use dynamic
binding and multiple inheritance.) The value of conditional and loop
statements is that they provide ways to replace
GOTO's with code
that can reveal the programmer's actual intention.
A theme throughout our discussion of this chapter will be the drastic difference between an assignment and an equality. Equalities should be denoted by symmetrical symbols, such as the usual "=" symbol, because relations of equality are symmetrical. However, assignment, for which we will usually use the notation ":=", is far from symmetrical. For example, the following statements have opposite meanings:
X := Y Y := XIt is interesting to look at the following list of notations that have been used for assignment in various languages:
A := 3 A = 3 A <- 3 A =. 3 3 -> A MOVE 3 TO A (SETQ A 3)The only symmetrical symbol is "=". The first is used by Pascal, Ada, Icon, ML, module-3, ALGOL 68, C++, and Java, the second by FORTRAN, PL/I, SNOBOL4, and C, the third by J, the fourth by BETA, the fifth by COBOL, and the last by LISP. This list is certainly not complete. By contrast, there is no assignment staatement in langauges of the OBJ family, and the equality symbol that as used there really does denote equality.
If X is an assignable variable in ML, and one writes
X := X + 1then one might well expect that X would get assigned 1 more than the reference to the cell where X is stored (but in fact, and more usefully, one gets a type error). What one should write to get this effect is
X := !X + 1where
!Xrepresents the dereference of
X, that is, the value stored in
X. Since the core of ML is functional, the ML approach to assignment is the result of (reluctant) thinking about imperative programming by functional programmers; perhaps it is intended to discourage you from using assignments, and certainly it is intended to make you think about what you are doing if you do use them. Requiring dereferencing to be made explicit certainly has this effect; and one must also go to a little extra trouble to declare an assignable variable, which must be reference-valued.
Perl has a version of multiple assignment for arrays, as illustrated in the following:
2 1 .(Thanks to Dana Dahlstrom, in CSE 230 W2002, for this.)
The subtlety of the assignment statement becomes much clearer when one attempts to provide a denotational semantics for it (see Chapter 13 of Sethi for more detail). For (all but the most primitive) imperative languages, such a semantics must carefully distinguish among the name of a cell, the location of a cell, and the content of a cell, and must provide ways of passing locations and contents as values. Identifiers are used as the names of cells, and these are usually called "variables," although strictly speaking the concept of a programming language "variable" ought to include all three aspects mentioned above. The assignment of a location to an identifier is called an environment and is often denoted by the Greek letter "rho", while the asignment of a value to a location is called a state and is denoted by the Greek letter "sigma". However, it would be clearer to call the latter map a store, and to call a pair of the two maps a state, because both are required to know the state of a computation that involves program variables. (One can find several different terminologies in the literature.) The environment rho gives the l-value of a variable, while the store signa gives the r-value, which is usually called just the "value". Pointers involve the complication that an r-value may be a location.
One nice thing that one can do with this machinery is give a precise definition for the notion of alias: an identifier Y is an alias for an identifier X if and only if rho(X) = rho(Y) (and usually we also require that X and Y are different when we use this terminology).
3.3 Christopher Strachey introduced the very useful notions of the l-value and r-value of a variable. Given an an assignment statement, the "l-value" of a variable on its left side is a reference to a cell, while "r-value" refers to the values of variables on the right side, which are the contents of their cell. This will be made more precise later on. Recall that Strachey designed the language CPL, which was implemented as BCPL, which in turn inspired C. It is better to call the container that holds the value of a programming variable a cell than to call it a location, and we will usually do so in class and in these notes. The term "location" should refer to a reference to a cell, rather than to the cell itself.
The C statement
i=j=k=lis a particularly good illustration of bad programming language syntax. Assuming you know that "=" is assignment in C (which itself is a bad choice), and you also know that assignment statements have r-values in C, there are still three different ways to parse the above line, so you must also know that C's parse conventions give
=is right associative in C. It then follows that the variables
i,j,k,lall end up with the same r-value, the one that was originally attached to
l. Amazingly, the other parses also work, at least in some implementations of C, though this is not well documented; they work because an assignment not only has an r-value, but also has an l-value. The statement
(i=j)=k=lis parsed as
(i=j)=(k=l)and therefore the variable
iwill first be assigned the r-value of
j, and then that of
kwill also get, although
jwill retain its original r-value. As an exercise, you should work out what happens under the third parse,
COME FROM program below is a rare example of a
good joke written in the form of a program; it serves to emphasize the
GOTO's, by exaggerating them:
10 J = 1 11 COME FROM 20 12 WRITE (6, 40) J STOP 13 COME FROM 10 20 J = J+2 40 FORMAT (I4)After assigning 1 to J at line 10, control passes to line 20, because of the
COME FROMat line 13, and then to line 12 because of the
COME FROMat line 11. So the value of J (which is 3) is written and the program stops. (The
COME FROMconstruct and above program are due to R.L. Clark, CACM 27, pp 394-395, 1984.
There are arguments against pointers that are essentially the same as those
GOTO; these can be summarized by noting that code
written with pointers fails to indicate the programmer's intent in the same
way as does code written with
GOTO's, and it gives "spaghetti
storage" as a result. The first line of defense against this is also the same
as that for
GOTO's, namely to provide good programming language
support for replacing the most common uses of the feature, which for pointers
is generally recursive data types. ML does a very good job of this, while C
only gives the appearance of providing such a capability.
3.5, 3.6 See the notes on ch 4 of AS for comments on invariants and related topics. It is difficult to give a good explanation for the assignment statement proof rule, but Sethi's is quite good, perhaps the best I have seen.