CSE 130: Principles of Programming Languages
Notes on Chapter 4 of Sethi (data types)

We can motivate the need for programming languages to provide constructs to define data types in much the same that we motivated the need for constructs to define control: without them, the code hides the intent of the programmer. Just as the unrestricted use of goto's can produce spaghetti code in which it is very hard to understand the flow of control, so the unrestricted use of pointers can produce spaghetti storage in which it is very hard to understand the structure of data. Just as programming languages slowly evolved better and better constructs for control flow, so they have also evolved better and better constructs for structuring data. The main concept is that of a type, which is a name for a collection of data items (i.e., r-values) having similar structure; the types of a language divide its data items into distinct (but usually not disjoint) classes.

If a language has a good type system, then programs will be easier to read, because it will be clearer what is going on; also, the compiler will be able to detect many errors, so that programs will also be easier to write, since easier to debug; moreover, types can help the compiler produce better code, especially for storage allocation. A strongly typed language requires type declarations for all r-values, whereas an untyped language requires no type declarations. Older languages tend to be less strongly typed than newer languages.

In lectures notes from 1967, Christopher Strachey of Oxford University gave a classification of the different kinds of polymorphism. (This is the same Strachey who introduced the very useful notions of "l-value" and "r-value" in these same lecture notes, and who designed CPL, which inspired C; in addition, he is the co-founder with Dana Scott of denotational semantics.) The kinds of polymorphism are parametric, subtype, and ad hoc. Ad hoc polymorphism is basically arbitrary overloading, for example, using + for both integer addition and Boolean exclusive or. Subsort polymorphism requires consistency across a type (or sort) hierarchy, for example, + for integer addition should agree with + for addition of reals and addition of rationals. The most original idea is parametric polymorphism, where an operation is parameterized by the type of its arguments; for example, head makes sense for lists of any type of element, and can be considered to have rank list a -> a, for any type a. Parametric polymorphism plays an important role in the ML language.

Abstract types are an important topic not discussed in this chapter. By hiding the representation of a data type, they make it impossible for certain kinds of problem to arise; one example would be the infamous Y2K problem.

Of historical interest in this chapter is the discussion of Zuse's Plankalkul, which was perhaps the first language that could reasonably be called "high level," even though, because of World War II, it was never implemented (maybe fortunately for the Allies). See pages 101 and 146.

Finally, I would mention the different kinds of type equivalence discussed near the end of the chapter. Many people do not realize that there are different notions of type equivalence, or that different languages make different choices, and that this can sometimes make a big difference. (See page 139 ff.)

To CSE 130 homepage
To CSE 130 notes page
Maintained by Joseph Goguen
© 2000 - 2004 Joseph Goguen
Last modified: Thu Feb 5 21:19:20 PST 2004