CSE 171: User Interface Design: Social and Technical Issues
6. Direct Manipulation
Chapter 6 of Shneiderman on direct manipulation is particularly important. Although not the originator of the idea (he attributes it to Ted Nelson), Shneiderman is famous for his sustained and enthusiastic advocacy of direct manipulation. Shniederman says (p.229) that direct manipulation is characterized by the following features: (1) analogical representation; (2) incremental operation; (3) reversibility; (4) physical action instead of syntax; (5) immediate visibility of results; and (6) graphic form. About the limitation to visibility in (5) and graphics in (6), it seems to me that representations can involve other senses than just sight. The principles of virtuality and transparency (p.202) are both important; the first is a brief formulation of direct manipulation (due to Ted Nelson) and the second is an important criterion for success: an interface is good if the user does not notice it but instead only notices the task at hand; so designers are most successful when users never think about them or their work!

I would especially emphasize points (1) and (4); see also Shneiderman's discussion of analogy in the last paragraph of section 6.3.1 (p.205). That point (1) for direct manipulation is an analogy or metaphor is very relevant for us, because it says that direct manipulation involves a semiotic morphism. The physical nature of this metaphor (in (4)) makes it seem more direct and concrete, and thus easier for users to grasp and to apply. Leibniz, who was no doubt thinking of mathematical notation, makes a similar point when he says (quoted on p.185 of Shniederman):

In signs, one sees an advantage for discovery that is greatest when they express the exact nature of a thing briefly and, as it were, picture it; then, indeed, the labor of thought is wonderfully diminished.
A good example is the difference between doing proofs in plane geometry with diagrams and doing them with axioms (see p.203); in fact, the constructions of plane geometry are a direct manipulation interface. Insight and creativity are enchanced by using a more direct and physical notation, due to the greater sense of involvement and connection that it produces. This in turn is due to the closer association with one's already existing sensory-motor schemata (which is closely related to important themes in contemporary cognitive linguistics on the nature of metaphor and the metaphorical nature of reasoning, among other things). Slider controls are mentioned in several places (e.g. p.202, p.214); they are a simple semiotic morphism having linear traversal as their source domain, and they could probably be applied even more widely than they have been. Scrollbars on windows are perhaps the most familiar special case; they both display and control what portion of a (possibly very long) "scroll" is actually displayed.

Classical semiotics also provides insight into the success of direct manipulation, by seeing it as an indexicality of motion, often reinforced by a specific kind of iconicity, called diagramatic iconicity by Peirce, where the geometric structure of the sign corresponds to the structure of its object (geographic information systems are particularly clear examples, since the structure of a geographic map corresponds to the structure of some part of the surface of the earth).

In this chapter, Shneiderman often confuses the essentially semiotic nature of direct manipulation with the technologies (or in more semiotic terms, the "media") that are used to implement it. Our semiotic conception of direct manipulation allows us to avoid this error, by clearly distinguishing between what functionality is preserved, and how it is represented. For example, it is perfectly possible to have a virtual reality interface to plain old 1978 DOS, complete with a haptic clicking keyboard and a virtual ancient VT100 screen with bright glowing green characters floating in space before you! Despite the fancy technology, this is still just command line DOS. This confusion is really just one aspect of a larger confusion, between the device that supports an interface, and the design and software that make the device actually function as an interface. Journalists often focus on the physical device (the "box") without giving much thought to the design of the interfaces of the applications that it supports. This is no doubt due in part to their receiving press releases from manufacturers and pressure from the advertising department, but it also reflects a bias in our culture.

Design errors often appear as violations of the underlying metaphor of a direct manipulation interface, or more generally, for any interface, of its semiotic morphism. One infamous example is the Apple Macintosh use of the trashcan for ejecting a floppy disk; it has confused generations of users, and it violates the trashcan metaphor in that the floppy is not trash. A more complex example is the use of lemmas in proofs, which leads to violations of a tree metaphor, but can be patched by using hypertext links (as in Kumo). Another example is the little arrows at the top and bottom (or left and right) of many scrollbars, because the physical motion metaphor does not suggest that these should be "hot." In fact, a scrollbar with this capability is actually a blend of two metaphors, and hence is a bit more difficult to learn; the second metaphor is similar to the "up" and "down" buttons on elevators.

The list of problems with direct manipulation on p.204 is very important; please read it at least three times. The "tatami" project in my lab ran into some of these problems with a direct manipulation interface that we built for proofs; it turned out that displaying the proof tree was useless for large proofs, because of the size and homogeneity of the display. Shneiderman's campaign on behalf of direct manipulation has been so successful that today, one is perhaps more likely to see it misapplied than to see it not applied when it should have been.

Piaget's work is in many ways outdated, so the material on page 207 should be taken with some grains of salt. The discussion of WIMP (p.207) is amusing but not very substantive. The guidelines for icon design (pp.208-9) are superficial but suggestive and useful. The remarks about emacs (p.210) are kind, but do not do justice to that amazingly flexible tool.

The last paragraph of section 6.5 (p.213) is interesting in connection with Lanier's piece; please reread Lanier, and see whether you think Shneiderman might not be saying the same thing in his very different way. The remark about notations for representing relations among residents of a home (p.217) seems a bit off the wall, but is interesting to think about from an ethical point of view; consider Bill Gates's high-tech house. The material on virtual environments may sound far out now, but I believe it will become increasingly important (pp.221-228). It is interesting to note that both the term "virtual reality" and the data glove (p.227) were invented by Lanier. The contrast between "immersive" and "looking-at" experience (bottom p.222) is interesting. Augmented reality (p.225) already has important industrial applications (e.g., at Boeing) and no doubt will have more. Situatedly aware shopping carts (p.225) do not appeal to me.

A Remark on Semiotics

Despite the mathematical character of the formal definitions of sign system and semiotic morphism, these concepts can be used very informally in practice, just as simple arithmetic is used in everyday life. For example, to see if we have enough gas left to drive from San Diego to Los Angeles, we make some assumptions, use some approximations, and only do the divisions and multiplications roughly. It would not make much sense to first work out an exact formula taking account of all contingencies, then do a careful analysis of the likelihoods, and finally calculate the mean and variance of the resulting probability distribution (though this is the sort of thing that NASA does for space shuttle missions). In user interface design, our goal is often just to get a rough understanding of why some design options may be better than others, and for this purpose, assumptions, approximations, and rough calculations are sufficient, especially when there is time pressure.

Note: This has also been copied into section 5 of the class notes.

To CSE 171 homepage
To the next section of the class notes
To the previous section of the class notes
Maintained by Joseph Goguen
© 2000, 2001 Joseph Goguen, all rights reserved.
Last modified: Fri May 16 13:25:47 PDT 2003