3.1 The assignment statement is the essence of imperative programming, but it is relatively little discussed in Part II of Sethi; for this reason, these notes will discuss it in some detail. The best key to understanding the meaning of assignment is Strachey's notions of l-value and r-value, which are discussed in section 3.3 below.
Both Turing and von Neumann used assertions and invariants (which are the two main ideas) to prove correctness of flow chart programs, which they independently introduced in the early 1940s. Turing also designed the first stored program electronic computer, although it was later redesigned and built by others in the UK. This information was secret for a long time, because Turing's work was an important part of allied World War II efforts to break German and Japanese secret codes for war messages. (There are several good books, and even a good play on Alan Turing's very interesting but rather sad life; the author of one of those books, Andrew Hodges, maintains a large website devoted to the life of Turing, which includes a page on the play.)
3.2 Advocates of so called structured programming railed against
GOTO statement, claiming that it produced
"spaghetti code," i.e., code that is full of seemingly random,
twisted links. (By the way, many today feel that unrestricted inheritance in
object oriented programming is just as bad, especially in large systems that
use dynamic binding and multiple inheritance.) The value of conditional and
loop statements is that they provide ways to replace
code that can reveal (aspects of) the programmer's actual intention.
Sethi does a very nice job of defining the notion of structure in structured programming, as involving single entry and single exit syntactic constructions. But there is a subtle point: the single exit from a structure may be taken from more than one point inside the structure, for example, if there is a return statement. There is also another factor that makes these constructions easier to read, which is that the keywords explicitly express the programmer's intention, such as doing a loop, or a case analysis, or a conditional.
A major theme throughout our discussion of this chapter will be the drastic difference between an assignment and an equality. Equalities should be denoted by symmetrical symbols, such as the usual "=" symbol, because relations of equality are symmetrical. However, assignment, for which we will usually use the notation ":=", is far from symmetrical. For example, the following two statements have opposite meanings:
X := Y Y := XIt is interesting to look at the following list of notations that have been used for assignment in various languages:
A := 3 A = 3 A <- 3 A =. 3 3 -> A MOVE 3 TO A (SETQ A 3)The first is used by Pascal, Ada, Icon, ML, module-3, and ALGOL 68, the second by FORTRAN, PL/I, SNOBOL4, C, C++, and Java, the third by J, the fourth by BETA, the fifth by COBOL, and the last by LISP. This list is certainly not complete. The only symmetrical symbol here is "=". By contrast, there is no assignment statement in functional langauges, for example, of the OBJ family, where the equality symbol really does denote equality.
Perl has a version of multiple assignment for arrays, as illustrated in the following:
2 1 .(Thanks to Dana Dahlstrom, in CSE 230 W2002, for this.)
The subtlety of the assignment statement becomes much clearer when one attempts to provide a denotational semantics for it (see Chapter 13 of Sethi for more detail). For (all but the most primitive) imperative languages, such a semantics must carefully distinguish among the name of a cell, the location of a cell, and the content of a cell, and must provide ways of passing locations and contents as values. Identifiers are used as the names of cells, and these are usually called "variables," but really the concept of a programming language "variable" should include all three aspects mentioned above. It is very different from the notion of variable in mathematics!
There are several small bug in this section. On page 66, "leap :=" should have come before the expression defining leap years (by the way, this is a very nice example!). On page 67, line 8, "remain 0" should be "remain =/= 0" (where =/= is "unequal). In Figure 3.2 on page 68, a period after then final end; i.e., it should have been just "end" instead of "end.". The exposition of definite iteration on page 69 may be confusing, because step is not actually used in the Pascal syntax.
Also there is something odd about the discussion of the case statement, in that step is mentioned, but there is no syntax to specify its value. The default value is of course 1, but some languages have explicit syntax, such as step 3 to set the value to something other than 1.
3.3 Christopher Strachey introduced the very useful notions of the l-value and r-value of a variable. Given an assignment statement, the "l-value" of a variable (on the left side) is a reference to a cell, while "r-value" refers to the value of a variable (on the right side); these values are the contents of the cells of the variables. This will be made more precise later on. It is better to call the container that holds the value of a programming variable a cell than to call it a location, and we will usually do so in class and in these notes. It is better for the term "location" to refer to a reference to a cell, rather than to the cell itself. (Recall that Strachey designed the language CPL, which was implemented as BCPL, and inspired C.)
The C statement
i=j=k=lis a particularly good illustration of bad programming language syntax. Assuming you know that "=" is assignment in C (which itself is a bad choice), and you also know that assignment statements have r-values in C, there are still three different ways to parse the above line, so you must also know that C's parse conventions give
=is right associative in C. It then follows that the variables
i,j,k,lall end up with the same r-value, the one that was originally attached to
l. Amazingly, the other parses also work, at least in some implementations of C, though this is not well documented; they work because an assignment not only has an r-value, but also has an l-value. The statement
(i=j)=k=lis parsed as
(i=j)=(k=l)and therefore the variable
iwill first be assigned the r-value of
j, and then that of
kwill also get, although
jwill retain its original r-value. As an exercise, you should work out what happens under the third parse,
COME FROM program below is a rare example of a
good joke written in the form of a program; it serves to emphasize the
GOTO's, by exaggerating them:
10 J = 1 11 COME FROM 20 12 WRITE (6, 40) J STOP 13 COME FROM 10 20 J = J+2 40 FORMAT (I4)After assigning 1 to J at line 10, control passes to line 20, because of the
COME FROMat line 13, and then to line 12 because of the
COME FROMat line 11. So the value of J (which is 3) is written and the program stops. (The
COME FROMconstruct and above program are due to R.L. Clark, CACM 27, pp 394-395, 1984.)
Flow diagrams were used by both Turing and von Neumann in the 1940s as an aid to writing programs. A long 1947 paper by Goldstein and von Neumann on programming recommends first drawing a flow diagram, then writing psuedo-code (the form of which later became called assembly code), and then hand translating into machine code.
There are arguments against pointers that are essentially the same as
those against the
GOTO; these can be summarized by noting that
code written with pointers fails to indicate the programmer's intent in the
same way as does code written with
GOTO's, and it gives
"spaghetti storage" as a result. The first line of defense against this is
also the same as that for
GOTO's, namely to provide good
programming language support for replacing the most common uses of the
feature, which for pointers is generally recursive data types. ML does a
very good job of this, while C only gives the appearance of providing such a
3.5 What you should get out of this section is an understanding of how to use invariants to write better code, and in particular, to avoid errors in the condition for exiting a loop. The purpose of this material is to help you become a better programmer by better understanding some of the more difficult aspects of loops. Please note that invariants are not just comments inside curly brackets! They should be precise mathematical expressions that depend only on the state (i.e., the values of variables, arrays, etc.) of the program environment at the point where the invariant occurs. The key point is that they should evaluate to true every time control is at the point in the program where the invariant is placed.
3.6 It is difficult to give a good explanation for the assignment statement proof rule, but Sethi's is quite good, perhaps the best I have seen. The fact that it is difficult to explain this rule is itself interesting, and suggests that assignment is a more subtle and difficult operation than is often realized. The examinations will not include questions on this section, but understanding the rules will help you to use invariants better, and hence to become a better programmer.
3.7 This section reviews some basics of C, which most of you probably already know.
X := X + 1then one might well expect that X would get assigned 1 more than the reference to the cell where X is stored (but in fact, and more usefully, one gets a type error). What one should write to get this effect is
X := !X + 1where
!Xrepresents the dereference of
X, that is, the value stored in
X. Since the core of ML is functional, the ML approach to assignment is the result of (reluctant) thinking about imperative programming by functional programmers; perhaps it is intended to discourage you from using assignments, and certainly it is intended to make you think about what you are doing if you do use them. Requiring dereferencing to be made explicit certainly has this effect; and one must also go to a little extra trouble to declare an assignable variable, which must be reference-valued.