CSE 230: Principles of Programming Languages
Notes on Chapter 3 of Stansifer (assignment, control, binding, pointers, and expressions)

3.1 It would be better to call the container that holds the value of a programming variable a cell than to call it a location, and we will usually do so in class and in these notes. The term "location" should refer to a reference to a cell, rather than to the cell itself.

A theme throughout our discussion of this chapter will be the drastic difference between an assignment and an equality. Equalities should be denoted by symmetrical symbols, such as the usual "=" symbol, because relations of equality are symmetrical. However, assignment, for which we will usually use the notation ":=", is far from symmetrical. For example, the statements

   X := Y
   Y := X  
have opposite meanings. It is interesting to look at the list on page 78 of notations that have been used for assignment in various languages; the only symmetrical symbol is "=", and this is only used by some rather old languages; by the way, this list is certainly not complete.

The very useful notions of "l-value" and "r-value" were introduced by Christopher Strachey, the same person who designed the language CPL, which was implemented as BCPL, which in turn inspired the C language. The "use" and "mention" terminology introduced on page 78 may be more confusing than helpful in discussing the l- and r-values of an assignment, because they both correspond to "use" in that they refer to a kind of a value. The l-value referred to by a variable on the left side is a reference to a cell, while the r-value referred to on the right side is the content of a cell. This is made more precise later on.

If X is an assignable variable in ML, and one writes

    X := X + 1  
then one might well expect that X would get assigned 1 more than the reference to the cell where X is stored (but in fact, and more usefully, one gets a type error). The core of ML is functional, and so the ML assignment represents the (reluctant) thinking of a functional programmer about imperative programming; perhaps it is intended to discourage you from using assignments, and certainly it is intended to make you think about what you are doing if you do use them. Requiring dereferencing to be made explicit certainly has this effect; one must also go to a little extra trouble to declare an assignable variable.

3.1.3 Page 80 of Stansifer says that no implemented language has multiple assignment, which may have been true at that time, but now Perl has a version of multiple assignment, for involving arrays, as illustrated in the following:

perl -e '$a=1; $b=2; ($a,$b)=($b,$a); print "$a $b\n"' which you can try on a unix machine; it prints 2 1. (Thanks to Dana Dahlstrom for this.)

3.2 The "come from" program on page 83 is a rare example of a good joke written in the form of a program; it serves to emphasize the problems with GOTO's. The value of all the loop and conditional statements discussed in this section is that they provide ways to replace GOTO's with code that actually reveals the programmer's intent.

3.3 The subtlety of the assignment statement becomes much clearer when one attempts to provide a formal semantics for it. For (all but the most primitive) imperative languages, such a semantics must carefully distinguish among the name of a cell, the location of a cell, and the content of a cell, and must provide ways of passing locations and contents as values. Identifiers are used as the names of cells, and these are usually called "variables," although strictly speaking the concept of a programming language "variable" ought to include all three aspects mentioned above. The assignment of a location to an identifier is called an environment and is denoted by the Greek letter "rho" in Stansifer, while the asignment of a value to a location is called a state and is denoted by the Greek letter "sigma". However, it would be clearer to call the latter map a store, and to call a pair of the two maps a state, because both are required to know the state of a computation that involves program variables. (One can find several different terminologies in the literature.) The environment rho gives the l-value of a variable, while the store signa gives the r-value, which is usually called just the "value". Pointers involve the complication that an r-value may be a location.

One nice thing that one can do with this machinery is give a precise definition for the notion of alias: an identifier Y is an alias for an identifier X if and only if rho(X) = rho(Y) (and usually we also require that X and Y are different when we use this terminology).

3.4 The arguments against pointers are essentially the same as those against the GOTO, and can be summarized by noting that code written with pointers fails to indicate the programmer's intent in the same way as does code written with GOTO's, and it gives "spaghetti storage" as a result. The first line of defense against this is also the same as that for GOTO's, namely to provide good programming language support for replacing the most common uses of the feature, which for pointers is generally recursive data types. ML does a very good job of this, while C only gives the appearance of providing such a capability (see page 91).

Since Stansifer's conclusion in section 3.4.2 is that collections are not a very valuable feature, we may as well just not bother to read about them.

3.5 Most of this section is straightforward. However, the expression

     i=j=k=l  
on page 98 is rather tricky. I consider this a particularly good illustration of bad programming language syntax. Even if you know that "=" is assignment in C (which itself is a bad choice), and you know that assignment statements have r-values in C, there are still three different ways to parse the above line, so you must also know that C's parse conventions give
     i=(j=(k=l))  
It then follows that what happens is the variables i,j,k,l all end up with the same r-value, the one that was originally attached to l. Amazingly, the other parses also work, at least in some implementations of C, though this is not well documented; they work because an assignment not only has an r-value, but also has an l-value. So the statement
     (i=j)=k=l  
will be parsed as
     (i=j)=(k=l)  
and the variable i will first be assigned the r-value of j, and then that of l, which k will also get, although j will retain its original r-value. As an exercise, you should work out what happens under the third parse,
     ((i=j)=k)=l  

3.5.5 Unfortunately, the discussion of referential transparency here is poorly done, and the issues involved are sufficiently complex that it is probably not worth the trouble to straighten out this material. So I suggest that you read this section just for the fun of seeing the kind of issue that philosophers like to play with, and how easy it is to get confused about such issues.


To CSE 230 homepage
To CSE 230 notes page
Maintained by Joseph Goguen
© 2000, 2001, 2002 Joseph Goguen
Last modified: Thu Mar 21 19:27:36 PST 2002