CSE 230: Principles of Programming Languages
Notes on Chapter 4 of Sethi (data types)

We can motivate the need for programming languages to provide constructs to define data types in much the same that we motivated the need for constructs to define control: without them, the code hides the intent of the programmer. Just as the unrestricted use of goto's can produce spaghetti code in which it is very hard to understand the flow of control, so the unrestricted use of pointers can produce spaghetti storage in which it is very hard to understand the structure of data. Just as programming languages slowly evolved better and better constructs for control flow, so they have also evolved better and better constructs for structuring data. The main concept is that of a type, which is a name for a collection of data items (i.e., r-values) having similar structure; the types of a language divide its data items into distinct (but usually not disjoint) classes.

If a language has a good type system, then programs will be easier to read, in that it will be clearer what is going on; moreover, the compiler will be able to detect many errors, so that programs will also be easier to write. A strongly typed language requires type declarations for all r-values, whereas an untyped language requires no type declarations. Older languages tend to be less strongly typed than newer languages.

Abstract types are an important topic not discussed in this chapter. By hiding the representation of a data type, they make it impossible for certain kinds of problem to arise; one example would be the infamous Y2K problem.

In lectures notes from 1967, Christopher Strachey of Oxford University gave a classification of the different kinds of polymorphism. This is the same Strachey who introduced CPL, which was implemented as BCPL and inspired C (the "C" is for Christopher); he also introduced the very useful notions of "l-value" and "r-value" in these same lecture notes; and he is the co-founder with Dana Scott of denotational semantics. The kinds of polymorphism are parametric, subtype, and ad hoc. Ad hoc polymorphism is basically arbitrary overloading, for example, using + for both integer addition and Boolean exclusive or. Subsort polymorphism requires consistency across a type (or sort) hierarchy, for example, + for integer addition should agree with + for addition of reals and addition of rationals. The most original idea is parametric polymorphism, where an operation is parameterized by the type of its arguments; for example, head makes sense for lists of any type of element, and can be considered to have rank list a -> a, for any type a. Parametric polymorphism plays an important role in the ML language.

Of historical interest in this chapter is the discussion of Zuse's Plankalkul, which was perhaps the first language that could reasonably be called "high level," even though, because of World War II, it was never implemented (maybe fortunately for the Allies). See pages 101 and 146.

Finally, I would mention the different kinds of type equivalence discussed near the end of the chapter. Many people do not realize that there are different notions of type equivalence, or that different languages make different choices, and that this can sometimes make a big difference. (See page 139 ff.)


To CSE 230 homepage
To CSE 230 notes page
Maintained by Joseph Goguen
© 2000, 2001, 2002, 2003 Joseph Goguen
Last modified: Tue Jan 27 15:31:27 PST 2004