This chapter of the class notes first explores direct manipulation, and in particular, its relationship to semiotic morphisms; it then gives some notes on chapter 5 of the text, explaining how this material could have been enriched using notions of preservation for semiotic morphisms, and it concludes with some additional remarks, mainly of a mathematical character, on semiotic morphisms, supplementing what is in the assigned readings.
Ben Shneiderman is known for his sustained and enthusiastic advocacy of direct manipulation, although he was not the originator of the idea, which he attributes to Ted Nelson. Shneiderman says that direct manipulation is characterized by the following features: (1) analogical representation; (2) incremental operation; (3) reversibility; (4) physical action instead of syntax; (5) immediate visibility of results; and (6) graphic form.
I would especially emphasize points (1) and (4). About the limitations to visibility in (5) and to graphics in (6), it seems to me that representations can involve other senses than just sight. That point (1) for direct manipulation is an analogy or metaphor is very relevant for us, because it says that direct manipulation involves a semiotic morphism. The physical nature of this metaphor (in (4)) makes it seem more direct and concrete, and thus easier for users to grasp and to apply. Leibniz, who was no doubt thinking of mathematical notation, makes a similar point when he says:
In signs, one sees an advantage for discovery that is greatest when they express the exact nature of a thing briefly and, as it were, picture it; then, indeed, the labor of thought is wonderfully diminished.A good example of this phenomenon is the difference between doing proofs in plane geometry with diagrams and doing them with axioms; in fact, the constructions of traditional Euclidean plane geometry rely on a kind of direct manipulation interface. Insight and creativity are enchanced by using a more direct and physical notation, due to the greater sense of involvement and connection that it produces. This in turn is due to the closer association with one's already existing sensory-motor schemata, which is closely related to important themes in contemporary cognitive linguistics on the nature of metaphor, where it is said that the most basic metaphors are image schemas that are grounded in human embodiment.
Two important principles can help deepen our understanding here: The Principle of Transparency is an important criterion for success: an interface is good if the user does not notice it but instead only notices the task at hand; so designers are most successful when users never think about them or their work! The Principle of Virtuality is Ted Nelson's a brief original formulation of direct manipulation, as a representation of reality that can be manipulated.
Shneiderman's campaign on behalf of direct manipulation has been so successful that today, one is perhaps more likely to see it misapplied than to see it not applied when it should have been. Here are my paraphrasings of Shneiderman's useful list of potential limitations of direct manipulation (page 204 of his text):
Classical semiotics also provides insight into the success of direct manipulation, by seeing it as an indexicality of motion, often reinforced by a specific kind of iconicity, called diagramatic iconicity by Peirce, where the geometric structure of the sign corresponds to the structure of its object (geographic information systems are particularly clear examples, since the structure of a geographic map corresponds to the structure of some part of the surface of the earth). Slider controls are a simple semiotic morphism having linear traversal as their source domain, and they could probably be applied even more widely than they have been. Scrollbars on windows are perhaps the most familiar special case; they both display and control what portion of a (possibly very long) "scroll" is actually displayed.
Unfortunately, Shneiderman often confuses the essentially semiotic nature of direct manipulation with the technologies (or in more semiotic terms, the "media") that are used to implement it. Our semiotic conception of direct manipulation allows us to avoid this error, by clearly distinguishing between what functionality is preserved, and how it is represented. For example, it is perfectly possible to have a virtual reality interface to plain old 1978 DOS, complete with a haptic clicking keyboard and a virtual ancient VT100 screen with bright glowing green characters floating in space before you! Despite the fancy technology, this is still just command line DOS. This confusion is really just one aspect of a larger confusion, between the device that supports an interface, and the design and software that make the device actually function as an interface. Journalists often focus on the physical device (the "box") without giving much thought to the design of the interfaces of the applications that it supports. This is no doubt due in part to their receiving press releases from manufacturers and pressure from the advertising department, but it also reflects a bias in our culture.
Design errors often appear as violations of the underlying metaphor of a direct manipulation interface, or more generally, for any interface, violations of consistency of its semiotic morphism. One infamous example is the Apple Macintosh use of the trashcan for ejecting a floppy disk; it has confused generations of users, and it violates the trashcan metaphor in that the floppy is not trash. A more complex example is the use of lemmas in proofs, which leads to violations of a tree metaphor, but can be patched by using hypertext links (as in Kumo). Another example is the little arrows at the top and bottom (or left and right) of many scrollbars, because the physical motion metaphor does not suggest that these should be "hot." In fact, a scrollbar with this capability is actually a blend of two metaphors, and hence is a bit more difficult to learn; the second metaphor is similar to the "up" and "down" buttons on elevators.
It is interesting to note that both the term "virtual reality" and the data glove were invented by Jeron Lanier. Augmented reality has important industrial applications (e.g., at Boeing) and no doubt will have more. Situatedly aware shopping carts do not appeal to me, and indeed, raise significant ethical issues. Again, the main point here is that direct manipulation is a form of semiotic morphism, and use of algebraic semiotic ideas can clarify some of the issues surrounding direct manipulation.
The slogan "Content organization drives visual organization" on page 82 can be seen as a corollary of our principle that good designs are semiotic morphisms, from a content space to a display space, that optimize some preservation properties. The four principles summarized on page 83 are good, but it seems to me that something very essential has been left out, namely that ordering relations in the "content organization" should be reflected (or "preserved", in our language of morphisms) by the display. It is easy to see that the (good) examples do exactly this; e.g., the six groups of links in Figure 5-2 are arranged by the size of the card-sorted groups, and there is even an (implicit) ordering by importance within groups by the importance of items; this is not the case for Figure 5-1. The course lecture on this topic contained much more information than this paragraph, showing how levels and priority in the source semiotic space can explain and improve the four principles in the text, making them both more general and more precise; you should have been there!
Another comment is that the Alignment principle is less important than the others, it is just one way to achieve consistency, which can also be achieved, for example, with color or with size; although alignment is very basic to the way that most browsers display many HTML commands, it is not necessarily basic for more creative graphical layouts, or even for all the natural ways to present HTML, for example, by speech generation (for blind users).
First, recall that the composition (f;g) of semiotic morphisms f: T -> T' and g: T' ->T'' is defined, for x an element of T, by (f;g)(x) = g(f(x)). This means first apply f to x, and then apply g to the result; the semicolon notation is borrowed from programming languages, where it again indicates first do one statement, then the next.
Definition: A binary relation > on a set P is a partial ordering if it is transitive (i.e., a > b and b > c imply a > c, for all a,b,c in P), and is anti-reflextive (i.e., a > a does not hold for any element a in P). A partial ordering > is a total ordering if for all a,b in P, either a = b or a > b or b > a.
Notice that the so-called "unordered list" of HTML actually produces graphic elements that display a total order in a natural way (since for each pair of distinct list elements, one is necessarily above the other); HTML "unordered lists" differ from "ordered lists" in being unenumerated, not in being unordered.
Definition: Given two partially ordered sets, P with > and P' with >', their lexicographic product consists of the set of pairs (a,a') with a in P and a' in P', ordered by (a,a') > (b,b') if a > a' or (a=a' and b > b').
Theorem: If P and P' are both totally ordered, then so is their lexicographic product.
The reason that the TOD ("time of day") semiotic space has some odd looking representations that appear to be good mathematically is that this particular theory of time is very basic, and does not include certain social conventions which we expect to see preserved in our representations of time. The most important of these is that the 1440 minutes of a day are enumerated using two counters, one that goes up to 24 and the other up to 60; these are combined by the constructor (_,_), to create pairs of counters. Here are the axioms for this more detailed source space, where h, m are variables for the hour and minute counters, respectively, and s denotes the unary "next" (or "tick") function on time (i.e., on the pairs of counter values), and also denotes the successor function on integers:
Definition: A projection M on a semiotic space S is an semiotic morphism with source and target S that is idempotent , i.e., that satisfies the equation
In general, a projection can be undefined on many elements of its semiotic space. A simple example is mapping numbers to their remainder modulo (say) 60; it is defined on all numbers, but not on the non-numerical character strings in W. A more complex example is the morphism on W to itself that takes total elapsed minutes to military time; more precisely, if N is a string of decimal digits, then
Definition: A semiotic theory T' is a refinement of a semiotic theory T if there is a semiotic morphism f: T -> T' which preserves all relevant properties of T and which induces an isomorphism of the algebras of terms of T and T'. [[More technically, if G, G' are the signatures of T, T', and if T(G) denotes the algebra of G-terms, then there must be a view f: G -> Der(G') from T to T' (where Der(G') is the derived term signature of G') that induces a G-isomorphism T(G) -> T(G')|G, the reduction of T(G') to a G-algebra via f.]]
For example, the two counter theory for time in minutes is a refinement of the one counter (with cycle 1440) theory. Similarly, the three counter theory of time in seconds is a refinement of the one counter theory with cycle 86,400. In these two examples, the simple theory is refined by encoding some additional social conventions as constructors and axioms, in a way that is consistent with the original theory.
Exercise: Consider the same points that are discussed above for time of day in minutes, but now for time of day measured in seconds, including the three corresponding clocks. Hint: The more refined version of the theory should have three counters instead of just two.