CSE 171: User Interface Design: Social and Technical Issues
6. Some Further Examples and Theory of Semiotic Morphisms

This chapter of the class notes first explores direct manipulation, and in particular, its relationship to semiotic morphisms; it then gives some notes on chapter 5 of the text, explaining how this material could have been enriched using notions of preservation for semiotic morphisms, and it concludes with some additional remarks, mainly of a mathematical character, on semiotic morphisms, supplementing what is in the assigned readings.

6.1 Direct Manipulation

Ben Shneiderman is known for his sustained and enthusiastic advocacy of direct manipulation, although he was not the originator of the idea, which he attributes to Ted Nelson. Shneiderman says that direct manipulation is characterized by the following features: (1) analogical representation; (2) incremental operation; (3) reversibility; (4) physical action instead of syntax; (5) immediate visibility of results; and (6) graphic form.

I would especially emphasize points (1) and (4). About the limitations to visibility in (5) and to graphics in (6), it seems to me that representations can involve other senses than just sight. That point (1) for direct manipulation is an analogy or metaphor is very relevant for us, because it says that direct manipulation involves a semiotic morphism. The physical nature of this metaphor (in (4)) makes it seem more direct and concrete, and thus easier for users to grasp and to apply. Leibniz, who was no doubt thinking of mathematical notation, makes a similar point when he says:

In signs, one sees an advantage for discovery that is greatest when they express the exact nature of a thing briefly and, as it were, picture it; then, indeed, the labor of thought is wonderfully diminished.
A good example of this phenomenon is the difference between doing proofs in plane geometry with diagrams and doing them with axioms; in fact, the constructions of traditional Euclidean plane geometry rely on a kind of direct manipulation interface. Insight and creativity are enchanced by using a more direct and physical notation, due to the greater sense of involvement and connection that it produces. This in turn is due to the closer association with one's already existing sensory-motor schemata, which is closely related to important themes in contemporary cognitive linguistics on the nature of metaphor, where it is said that the most basic metaphors are image schemas that are grounded in human embodiment.

Two important principles can help deepen our understanding here: The Principle of Transparency is an important criterion for success: an interface is good if the user does not notice it but instead only notices the task at hand; so designers are most successful when users never think about them or their work! The Principle of Virtuality is Ted Nelson's a brief original formulation of direct manipulation, as a representation of reality that can be manipulated.

Shneiderman's campaign on behalf of direct manipulation has been so successful that today, one is perhaps more likely to see it misapplied than to see it not applied when it should have been. Here are my paraphrasings of Shneiderman's useful list of potential limitations of direct manipulation (page 204 of his text):

  1. Spatial or visual representations are not necessarily better than text, because they may be too spread out, requiring tedious user scrolling over large displays. For example, flow charts may be useful for algorithms or small programs, but rapidly become less useful as program size increases.
  2. Users must learn the meaning of components of a visual representation, and a graphic icon may may require much more learning time than a word or phrase. Many examples can be found in Microsoft products, such as Word.
  3. Users can easily over- or under- estimate the functions associated with some graphical analogy, even if the image itself seems clear.
  4. For users who are experienced at typing, moving their hand off the keyboard to the mouse and back can consome a great deal more time than simply typing the relevant command. For example, the keyboard of a real calculator is much more efficient than any graphical representation that requires use of a mouse.
The tatami project in my lab ran into some of these problems with a direct manipulation interface that we built for proofs; it turned out that displaying the proof tree was useless for large proofs, because of the size and homogeneity of the display. Instead, we broke proofs into "mind sized" pieces, each having its own webpage.

Classical semiotics also provides insight into the success of direct manipulation, by seeing it as an indexicality of motion, often reinforced by a specific kind of iconicity, called diagramatic iconicity by Peirce, where the geometric structure of the sign corresponds to the structure of its object (geographic information systems are particularly clear examples, since the structure of a geographic map corresponds to the structure of some part of the surface of the earth). Slider controls are a simple semiotic morphism having linear traversal as their source domain, and they could probably be applied even more widely than they have been. Scrollbars on windows are perhaps the most familiar special case; they both display and control what portion of a (possibly very long) "scroll" is actually displayed.

Unfortunately, Shneiderman often confuses the essentially semiotic nature of direct manipulation with the technologies (or in more semiotic terms, the "media") that are used to implement it. Our semiotic conception of direct manipulation allows us to avoid this error, by clearly distinguishing between what functionality is preserved, and how it is represented. For example, it is perfectly possible to have a virtual reality interface to plain old 1978 DOS, complete with a haptic clicking keyboard and a virtual ancient VT100 screen with bright glowing green characters floating in space before you! Despite the fancy technology, this is still just command line DOS. This confusion is really just one aspect of a larger confusion, between the device that supports an interface, and the design and software that make the device actually function as an interface. Journalists often focus on the physical device (the "box") without giving much thought to the design of the interfaces of the applications that it supports. This is no doubt due in part to their receiving press releases from manufacturers and pressure from the advertising department, but it also reflects a bias in our culture.

Design errors often appear as violations of the underlying metaphor of a direct manipulation interface, or more generally, for any interface, violations of consistency of its semiotic morphism. One infamous example is the Apple Macintosh use of the trashcan for ejecting a floppy disk; it has confused generations of users, and it violates the trashcan metaphor in that the floppy is not trash. A more complex example is the use of lemmas in proofs, which leads to violations of a tree metaphor, but can be patched by using hypertext links (as in Kumo). Another example is the little arrows at the top and bottom (or left and right) of many scrollbars, because the physical motion metaphor does not suggest that these should be "hot." In fact, a scrollbar with this capability is actually a blend of two metaphors, and hence is a bit more difficult to learn; the second metaphor is similar to the "up" and "down" buttons on elevators.

It is interesting to note that both the term "virtual reality" and the data glove were invented by Jeron Lanier. Augmented reality has important industrial applications (e.g., at Boeing) and no doubt will have more. Situatedly aware shopping carts do not appeal to me, and indeed, raise significant ethical issues. Again, the main point here is that direct manipulation is a form of semiotic morphism, and use of algebraic semiotic ideas can clarify some of the issues surrounding direct manipulation.

6.2 Notes on Chapter 5 of McCracken & Wolfe

The slogan "Content organization drives visual organization" on page 82 can be seen as a corollary of our principle that good designs are semiotic morphisms, from a content space to a display space, that optimize some preservation properties. The four principles summarized on page 83 are good, but it seems to me that something very essential has been left out, namely that ordering relations in the "content organization" should be reflected (or "preserved", in our language of morphisms) by the display. It is easy to see that the (good) examples do exactly this; e.g., the six groups of links in Figure 5-2 are arranged by the size of the card-sorted groups, and there is even an (implicit) ordering by importance within groups by the importance of items; this is not the case for Figure 5-1. The course lecture on this topic contained much more information than this paragraph, showing how levels and priority in the source semiotic space can explain and improve the four principles in the text, making them both more general and more precise; you should have been there!

Another comment is that the Alignment principle is less important than the others, it is just one way to achieve consistency, which can also be achieved, for example, with color or with size; although alignment is very basic to the way that most browsers display many HTML commands, it is not necessarily basic for more creative graphical layouts, or even for all the natural ways to present HTML, for example, by speech generation (for blind users).

6.3 Additional Notes for Section 3 of An Introduction to Algebraic Semiotics, with Applications to User Interface Design

First, recall that the composition (f;g) of semiotic morphisms f: T -> T' and g: T' ->T'' is defined, for x an element of T, by (f;g)(x) = g(f(x)). This means first apply f to x, and then apply g to the result; the semicolon notation is borrowed from programming languages, where it again indicates first do one statement, then the next.

Definition: A binary relation > on a set P is a partial ordering if it is transitive (i.e., a > b and b > c imply a > c, for all a,b,c in P), and is anti-reflextive (i.e., a > a does not hold for any element a in P). A partial ordering > is a total ordering if for all a,b in P, either a = b or a > b or b > a.

Notice that the so-called "unordered list" of HTML actually produces graphic elements that display a total order in a natural way (since for each pair of distinct list elements, one is necessarily above the other); HTML "unordered lists" differ from "ordered lists" in being unenumerated, not in being unordered.

Definition: Given two partially ordered sets, P with > and P' with >', their lexicographic product consists of the set of pairs (a,a') with a in P and a' in P', ordered by (a,a') > (b,b') if a > a' or (a=a' and b > b').

Theorem: If P and P' are both totally ordered, then so is their lexicographic product.

The reason that the TOD ("time of day") semiotic space has some odd looking representations that appear to be good mathematically is that this particular theory of time is very basic, and does not include certain social conventions which we expect to see preserved in our representations of time. The most important of these is that the 1440 minutes of a day are enumerated using two counters, one that goes up to 24 and the other up to 60; these are combined by the constructor (_,_), to create pairs of counters. Here are the axioms for this more detailed source space, where h, m are variables for the hour and minute counters, respectively, and s denotes the unary "next" (or "tick") function on time (i.e., on the pairs of counter values), and also denotes the successor function on integers:

s(h, m) = (h, s(m)) if s(m) < 60 . s(h, m) = (s(h), m) if s(m) = 60 and s(h) < 24 . s(23, 59) = (0, 0) . It is interesting to notice that the usual ordering on time is exactly the lexicographic product of the two counters, that is, (h, m) > (h', m') if h > h' or (h=h' and m> m'). With this additional structure on the source space, the allowable semiotic morphisms are what we would expect, and in particular, both the strange unary representation, and the decimal number of elapsed minutes, fail to preserve the structure created by the constructor (_,_).

Definition: A projection M on a semiotic space S is an semiotic morphism with source and target S that is idempotent , i.e., that satisfies the equation

M ; M = M . A simple example maps a list of numbers to their sum, given as a list. For example, this morphism maps (1,2,3) to (6), and (4,5,6) to (15), and (7,8,9) to (24), and then maps each of them to itself. A similar example computes the average of a list of numbers. A bit more complex example is a mapping of lists of lists to numbers lists of sums of the component lists.

In general, a projection can be undefined on many elements of its semiotic space. A simple example is mapping numbers to their remainder modulo (say) 60; it is defined on all numbers, but not on the non-numerical character strings in W. A more complex example is the morphism on W to itself that takes total elapsed minutes to military time; more precisely, if N is a string of decimal digits, then

M(N) = Q : R M(Q : R) = Q : R where "Q : R" is the quotient Q of N by 60, as a string of digits, followed by the colon, followed by the remainder R of N by 60, again as a string of digits.

Definition: A semiotic theory T' is a refinement of a semiotic theory T if there is a semiotic morphism f: T -> T' which preserves all relevant properties of T and which induces an isomorphism of the algebras of terms of T and T'. [[More technically, if G, G' are the signatures of T, T', and if T(G) denotes the algebra of G-terms, then there must be a view f: G -> Der(G') from T to T' (where Der(G') is the derived term signature of G') that induces a G-isomorphism T(G) -> T(G')|G, the reduction of T(G') to a G-algebra via f.]]

For example, the two counter theory for time in minutes is a refinement of the one counter (with cycle 1440) theory. Similarly, the three counter theory of time in seconds is a refinement of the one counter theory with cycle 86,400. In these two examples, the simple theory is refined by encoding some additional social conventions as constructors and axioms, in a way that is consistent with the original theory.

Exercise: Consider the same points that are discussed above for time of day in minutes, but now for time of day measured in seconds, including the three corresponding clocks. Hint: The more refined version of the theory should have three counters instead of just two.

To CSE 171 homepage
To the next section of the class notes
To the previous section of the class notes
Maintained by Joseph Goguen
© 2000 - 2005 Joseph Goguen, all rights reserved.
Last modified: Fri May 6 09:48:12 PDT 2005