This chapter of the class notes first explores direct manipulation, and in particular, its relationship to semiotic morphisms; it then gives some notes on chapter 5 of the text, explaining how this material could have been enriched using notions of preservation for semiotic morphisms, and it concludes with some additional remarks, mainly of a mathematical character, on semiotic morphisms, supplementing what is in the assigned readings.

Ben Shneiderman is known for his sustained and enthusiastic advocacy of
direct manipulation, although he was not the originator of the idea, which he
attributes to Ted Nelson. Shneiderman says that **direct
manipulation** is characterized by the following features: (1) analogical
representation; (2) incremental operation; (3) reversibility; (4) physical
action instead of syntax; (5) immediate visibility of results; and (6)
graphic form.

I would especially emphasize points (1) and (4). About the limitations to visibility in (5) and to graphics in (6), it seems to me that representations can involve other senses than just sight. That point (1) for direct manipulation is an analogy or metaphor is very relevant for us, because it says that direct manipulation involves a semiotic morphism. The physical nature of this metaphor (in (4)) makes it seem more direct and concrete, and thus easier for users to grasp and to apply. Leibniz, who was no doubt thinking of mathematical notation, makes a similar point when he says:

In signs, one sees an advantage for discovery that is greatest when they express the exact nature of a thing briefly and, as it were, picture it; then, indeed, the labor of thought is wonderfully diminished.A good example of this phenomenon is the difference between doing proofs in plane geometry with diagrams and doing them with axioms; in fact, the constructions of traditional Euclidean plane geometry rely on a kind of direct manipulation interface. Insight and creativity are enchanced by using a more direct and physical notation, due to the greater sense of involvement and connection that it produces. This in turn is due to the closer association with one's already existing sensory-motor schemata, which is closely related to important themes in contemporary cognitive linguistics on the nature of metaphor, where it is said that the most basic metaphors are

Two important principles can help deepen our understanding here: The
*Principle of Transparency* is an important criterion for success: an
interface is good if the user does not notice it but instead only notices the
task at hand; so designers are most successful when users never think about
them or their work! The *Principle of Virtuality* is Ted Nelson's a
brief original formulation of direct manipulation, as a representation of
reality that can be manipulated.

Shneiderman's campaign on behalf of direct manipulation has been so successful that today, one is perhaps more likely to see it misapplied than to see it not applied when it should have been. Here are my paraphrasings of Shneiderman's useful list of potential limitations of direct manipulation (page 204 of his text):

- Spatial or visual representations are not necessarily better than text, because they may be too spread out, requiring tedious user scrolling over large displays. For example, flow charts may be useful for algorithms or small programs, but rapidly become less useful as program size increases.
- Users must learn the meaning of components of a visual representation, and a graphic icon may may require much more learning time than a word or phrase. Many examples can be found in Microsoft products, such as Word.
- Users can easily over- or under- estimate the functions associated with some graphical analogy, even if the image itself seems clear.
- For users who are experienced at typing, moving their hand off the keyboard to the mouse and back can consome a great deal more time than simply typing the relevant command. For example, the keyboard of a real calculator is much more efficient than any graphical representation that requires use of a mouse.

Classical semiotics also provides insight into the success of direct
manipulation, by seeing it as an indexicality of motion, often reinforced by
a specific kind of iconicity, called **diagramatic iconicity** by Peirce,
where the geometric structure of the sign corresponds to the structure of its
object (geographic information systems are particularly clear examples, since
the structure of a geographic map corresponds to the structure of some part
of the surface of the earth). Slider controls are a simple semiotic morphism
having linear traversal as their source domain, and they could probably be
applied even more widely than they have been. Scrollbars on windows are
perhaps the most familiar special case; they both display and control what
portion of a (possibly very long) "scroll" is actually displayed.

Unfortunately, Shneiderman often confuses the essentially semiotic nature of direct manipulation with the technologies (or in more semiotic terms, the "media") that are used to implement it. Our semiotic conception of direct manipulation allows us to avoid this error, by clearly distinguishing between what functionality is preserved, and how it is represented. For example, it is perfectly possible to have a virtual reality interface to plain old 1978 DOS, complete with a haptic clicking keyboard and a virtual ancient VT100 screen with bright glowing green characters floating in space before you! Despite the fancy technology, this is still just command line DOS. This confusion is really just one aspect of a larger confusion, between the device that supports an interface, and the design and software that make the device actually function as an interface. Journalists often focus on the physical device (the "box") without giving much thought to the design of the interfaces of the applications that it supports. This is no doubt due in part to their receiving press releases from manufacturers and pressure from the advertising department, but it also reflects a bias in our culture.

Design errors often appear as violations of the underlying metaphor of a
direct manipulation interface, or more generally, for any interface,
violations of consistency of its semiotic morphism. One infamous example is
the Apple Macintosh use of the trashcan for ejecting a floppy disk; it has
confused generations of users, and it violates the trashcan metaphor in that
the floppy is not trash. A more complex example is the use of lemmas in
proofs, which leads to violations of a tree metaphor, but can be patched by
using hypertext links (as in Kumo).
Another example is the little arrows at the top and bottom (or left and
right) of many scrollbars, because the physical motion metaphor does not
suggest that these should be "hot." In fact, a scrollbar with this
capability is actually a *blend* of two metaphors, and hence is a bit
more difficult to learn; the second metaphor is similar to the "up" and
"down" buttons on elevators.

It is interesting to note that both the term "virtual reality" and the
data glove were invented by Jeron Lanier. Augmented reality has important
industrial applications (e.g., at Boeing) and no doubt will have more.
Situatedly aware shopping carts do *not* appeal to me, and indeed,
raise significant ethical issues. Again, the main point here is that direct
manipulation is a form of semiotic morphism, and use of algebraic semiotic
ideas can clarify some of the issues surrounding direct manipulation.

The slogan "Content organization drives visual organization" on page 82 can be seen as a corollary of our principle that good designs are semiotic morphisms, from a content space to a display space, that optimize some preservation properties. The four principles summarized on page 83 are good, but it seems to me that something very essential has been left out, namely that ordering relations in the "content organization" should be reflected (or "preserved", in our language of morphisms) by the display. It is easy to see that the (good) examples do exactly this; e.g., the six groups of links in Figure 5-2 are arranged by the size of the card-sorted groups, and there is even an (implicit) ordering by importance within groups by the importance of items; this is not the case for Figure 5-1. The course lecture on this topic contained much more information than this paragraph, showing how levels and priority in the source semiotic space can explain and improve the four principles in the text, making them both more general and more precise; you should have been there!

Another comment is that the Alignment principle is less important than the others, it is just one way to achieve consistency, which can also be achieved, for example, with color or with size; although alignment is very basic to the way that most browsers display many HTML commands, it is not necessarily basic for more creative graphical layouts, or even for all the natural ways to present HTML, for example, by speech generation (for blind users).

First, recall that the **composition** (f;g) of semiotic morphisms f:
**T** -> **T'** and g: **T'** ->**T''** is defined, for x an
element of **T**, by (f;g)(x) = g(f(x)). This means first apply f to x,
and then apply g to the result; the semicolon notation is borrowed from
programming languages, where it again indicates first do one statement, then
the next.

**Definition:** A binary relation > on a set P is a **partial
ordering** if it is **transitive** (i.e., a > b and b > c imply a > c,
for all a,b,c in P), and is **anti-reflextive** (i.e., a > a does not hold
for any element a in P). A partial ordering > is a **total ordering** if
for all a,b in P, either a = b or a > b or b > a.

Notice that the so-called "unordered list" of HTML actually produces graphic elements that display a total order in a natural way (since for each pair of distinct list elements, one is necessarily above the other); HTML "unordered lists" differ from "ordered lists" in being unenumerated, not in being unordered.

**Definition:** Given two partially ordered sets, P with > and P' with
>', their **lexicographic product** consists of the set of pairs (a,a')
with a in P and a' in P', ordered by (a,a') > (b,b') if a > a' or (a=a' and b
> b').

**Theorem:** If P and P' are both totally ordered, then so is their
lexicographic product.

The reason that the **TOD** ("time of day") semiotic space has some odd
looking representations that appear to be good mathematically is that this
particular theory of time is very basic, and does not include certain social
conventions which we expect to see preserved in our representations of time.
The most important of these is that the 1440 minutes of a day are enumerated
using two counters, one that goes up to 24 and the other up to 60; these are
combined by the constructor (_,_), to create pairs of counters. Here are the
axioms for this more detailed source space, where h, m are variables for the
hour and minute counters, respectively, and s denotes the unary "next" (or
"tick") function on time (i.e., on the pairs of counter values), and also
denotes the successor function on integers:

**Definition:** A **projection** M on a semiotic space S is an
semiotic morphism with source and target S that is **idempotent** , i.e.,
that satisfies the equation

In general, a projection can be undefined on many elements of its semiotic
space. A simple example is mapping numbers to their remainder modulo (say)
60; it is defined on all numbers, but not on the non-numerical character
strings in **W**. A more complex example is the morphism on **W** to
itself that takes total elapsed minutes to military time; more precisely, if
N is a string of decimal digits, then

**Definition:** A semiotic theory **T'** is a **refinement** of a
semiotic theory **T** if there is a semiotic morphism f: **T** ->
**T'** which preserves all relevant properties of **T** and which
induces an isomorphism of the algebras of terms of **T** and **T'**.
[[More technically, if G, G' are the signatures of **T**, **T'**, and
if T(G) denotes the algebra of G-terms, then there must be a view f: G ->
Der(G') from **T** to **T'** (where Der(G') is the derived term
signature of G') that induces a G-isomorphism T(G) -> T(G')|_{G}, the
reduction of T(G') to a G-algebra via f.]]

For example, the two counter theory for time in minutes is a refinement of
the one counter (with cycle 1440) theory. Similarly, the three counter
theory of time in seconds is a refinement of the one counter theory with
cycle 86,400. In these two examples, the simple theory is refined by
encoding some additional social conventions as constructors and axioms, in a
way that is *consistent* with the original theory.

**Exercise:** Consider the same points that are discussed above for
time of day in minutes, but now for time of day measured in seconds,
including the three corresponding clocks. *Hint:* The more refined
version of the theory should have three counters instead of just two.

To CSE 171 homepage

To the next section of the class notes

To the previous section of the class notes

Maintained by Joseph Goguen

© 2000 - 2005 Joseph Goguen, all rights reserved.

Last modified: Fri May 6 09:48:12 PDT 2005