This chapter of the class notes first explores direct manipulation, and in particular, its relationship to semiotic morphisms; it then gives some notes on chapter 5 of the text, explaining how this material could have been enriched using notions of preservation for semiotic morphisms, and it concludes with some additional remarks, mainly of a mathematical character, on semiotic morphisms, supplementing what is in the assigned readings.
Ben Shneiderman is known for his sustained and enthusiastic advocacy of direct manipulation, although he was not the originator of the idea, which he attributes to Ted Nelson. Shneiderman says that direct manipulation is characterized by the following features: (1) analogical representation; (2) incremental operation; (3) reversibility; (4) physical action instead of syntax; (5) immediate visibility of results; and (6) graphic form.
I would especially emphasize points (1) and (4). About the limitations to visibility in (5) and to graphics in (6), it seems to me that representations can involve other senses than just sight. That point (1) for direct manipulation is an analogy or metaphor is very relevant for us, because it says that direct manipulation involves a semiotic morphism. The physical nature of this metaphor (in (4)) makes it seem more direct and concrete, and thus easier for users to grasp and to apply. Leibniz, who was no doubt thinking of mathematical notation, makes a similar point when he says:
In signs, one sees an advantage for discovery that is greatest when they express the exact nature of a thing briefly and, as it were, picture it; then, indeed, the labor of thought is wonderfully diminished.A good example of this phenomenon is the difference between doing proofs in plane geometry with diagrams and doing them with axioms; in fact, the constructions of traditional Euclidean plane geometry rely on a kind of direct manipulation interface. Insight and creativity are enchanced by using a more direct and physical notation, due to the greater sense of involvement and connection that it produces. This in turn is due to the closer association with one's already existing sensory-motor schemata, which is closely related to important themes in contemporary cognitive linguistics on the nature of metaphor, where it is said that the most basic metaphors are image schemas that are grounded in human embodiment.
Two important principles can help deepen our understanding here: The Principle of Transparency is an important criterion for success: an interface is good if the user does not notice it but instead only notices the task at hand; so designers are most successful when users never think about them or their work! The Principle of Virtuality is Ted Nelson's a brief original formulation of direct manipulation, as a representation of reality that can be manipulated.
Shneiderman's campaign on behalf of direct manipulation has been so successful that today, one is perhaps more likely to see it misapplied than to see it not applied when it should have been. Here are my paraphrasings of Shneiderman's useful list of potential limitations of direct manipulation (page 204 of his text):
Classical semiotics also provides insight into the success of direct manipulation, by seeing it as an indexicality of motion, often reinforced by a specific kind of iconicity, called diagramatic iconicity by Peirce, where the geometric structure of the sign corresponds to the structure of its object (geographic information systems are particularly clear examples, since the structure of a geographic map corresponds to the structure of some part of the surface of the earth). Slider controls are a simple semiotic morphism having linear traversal as their source domain, and they could probably be applied even more widely than they have been. Scrollbars on windows are perhaps the most familiar special case; they both display and control what portion of a (possibly very long) "scroll" is actually displayed.
Unfortunately, Shneiderman often confuses the essentially semiotic nature of direct manipulation with the technologies (or in more semiotic terms, the "media") that are used to implement it. Our semiotic conception of direct manipulation allows us to avoid this error, by clearly distinguishing between what functionality is preserved, and how it is represented. For example, it is perfectly possible to have a virtual reality interface to plain old 1978 DOS, complete with a haptic clicking keyboard and a virtual ancient VT100 screen with bright glowing green characters floating in space before you! Despite the fancy technology, this is still just command line DOS. This confusion is really just one aspect of a larger confusion, between the device that supports an interface, and the design and software that make the device actually function as an interface. Journalists often focus on the physical device (the "box") without giving much thought to the design of the interfaces of the applications that it supports. This is no doubt due in part to their receiving press releases from manufacturers and pressure from the advertising department, but it also reflects a bias in our culture.
Design errors often appear as violations of the underlying metaphor of a direct manipulation interface, or more generally, for any interface, violations of consistency of its semiotic morphism. One infamous example is the Apple Macintosh use of the trashcan for ejecting a floppy disk; it has confused generations of users, and it violates the trashcan metaphor in that the floppy is not trash. A more complex example is the use of lemmas in proofs, which leads to violations of a tree metaphor, but can be patched by using hypertext links (as in Kumo). Another example is the little arrows at the top and bottom (or left and right) of many scrollbars, because the physical motion metaphor does not suggest that these should be "hot." In fact, a scrollbar with this capability is actually a blend of two metaphors, and hence is a bit more difficult to learn; the second metaphor is similar to the "up" and "down" buttons on elevators.
It is interesting to note that both the term "virtual reality" and the data glove were invented by Jeron Lanier. Augmented reality has important industrial applications (e.g., at Boeing) and no doubt will have more. Situatedly aware shopping carts do not appeal to me, and indeed, raise significant ethical issues. Again, the main point here is that direct manipulation is a form of semiotic morphism, and use of algebraic semiotic ideas can clarify some of the issues surrounding direct manipulation.
Direct manipulation occurs when a visual (or other) representation can be manipulated (e.g., by drag-and-drop, or clicking on parts) in a way that corresponds to what is represented. This can be very intuitive and convenient when it works, though there are some interesting cases where it does not work well. Among the disadvantages of direct manipulation, I would stress that it is hard to use for navigating and manipulating large homogeneous structures, such as proof trees. Shneiderman has long been a strong advocate of direct manipulation, and has suggested that it is a viable alternative to agents. Shneiderman's text includes a lengthy flame about agents (pages 83-89 of the third edition) emphasizing the problem of responsibility; his position is argued much more effectively here than in the Scientific American interview; see also Agents of Alienation by Jaron Lanier on this same topic, which has given rise to ongoing debates in the HCI community. I would like to emphasize the following sentence from page 85 of Shneiderman's text, which occurs in the context of air traffic control, but which applies equally well to many other domains, such as nuclear reactor control, and the so called Star Wars missile defense system:
In short, real-world situations are so complex that it is impossible to anticipate and program for every contingency; human judgement and values are necessary in the decision-making process.Much more along these lines can be found in the important book Normal Accidents, by Charles Perrow; he shows that many catastrophic accidents, such as the famous Three Mile Island nuclear incident, arise from the simultaneous occurrence of several unusual situations.
Direct manipulation is the illusion that one is directly manipulating something real, such downloading a file from a remote machine to a local directory by picking it up and moving it over. As an exercise, it is interesting to think about all the different ways in which this really is an illusion. Nonetheless, one might say that the illusion is real enough, if the user does not think about the interface at all, but only about the task, and describes that task using a metaphor of directly manipulating objects. In fact, it is an important general principle of interface design that the interface is successful to the extent that it is invisible; if the user must pay attention to the inteface instead of the task, then something is wrong. This principle is sometimes called transparency. Affordances and visualizations are other ways (other than direct manipulation) that transparency can be achieved.
Lanier's Agents of Alienation is an unusual piece. In my opinion, Lanier's rhetoric is excessive, and he glosses over some important points; but let's seek out what is interesting in it. For now, I would highlight two main points: (1) Lanier is against agents, and does not accept that they "are inevitable" (a position he attributes to Negroponte); (2) Lanier goes beyond purely technical issues, and raises basic ethical issues - in this respect, his argument against agents is quite different from Shneiderman's. However, he does connect with issues in user interface design, and it is very interesting to notice that his list of agents being promoted in 1995 is now a list of notorious failures. See also Direct Manipulation vs. Interface Agents, by Ben Shneiderman and Pattie Maes, Interactions, 4, no. 6, pp 42-61, 1997, a digest of a debate held at CHI 97, which attracted a lot of attention in the HCI community. Some further comments and insights on this debate are given in my paper Are Agents an Answer or a Question?
An important point about this paper is signified by the numeral "2" in its title: the system described here is the result of an iterative design process, in which an earlier system, named Answer Garden, was subjected to a careful evaluation, based on the experience of actual users, and then the results of this evaluation were carefully analyzed, pinpointing certain weaknesses in some underlying assumptions, such as a sharp distinction between experts and ordinary users, leading to new assumptions and a new design based upon them. The result is a typical example of what ethnographers call situatedness, and what designers call site specificity: one learns important and interesting things by evaluating a system in the context of a particular community, but those things may not generalize to other communities. For example, distinguishing experts from users might be valid in a different context, and the particular escalation hierarchy that Ackerman designed for this community might well not work for a different community. On the other hand, software tools like those developed by Ackerman are valuable for building systems for other communities, and several of the more abstract ideas are more generally valid, especially that of colaborative filtering.
The paper addresses the important problem of integrating user communities with their computer systems. Such tasks are especially important for huge databases of badly understood, poorly structured data. Specific techniques used in AG2 include its answer escalation hierarchy, anonymization, an engine to find experts, a statistics collection service, and support for collaborative authoring. All these have potential applications in many other areas, though they are far from exhausting the range of useful modules; together they constitute a toolkit for "socializing" databases. These features were added by Ackerman in moving from his first version of Answer Garden to the second (AG2), which is based on a collection of modules that can be assembled in a variety of ways using Tcl/Tk.
User interfaces viewed as signs are clearly dynamic is many important cases where the display changes in response to the user and environment. Unfortunately, there has not been much time to discuss this topic in this class, but there is a natural extension of semiotic systems to dynamic semiotic systems using the theory of hidden algebra, which extends algebraic specification to dynamical systems. More simply, we can consider terms built by certain constructors to be states of a system, with certain functions defined as actions which change states of the system. .....THIS PARAGRAPH SHOULD BE EXPANDED.....