Abstract: Information visualization design is generally ad hoc, using trial and error, and perhaps prior visualization experiments. This paper suggests a different approach: general design principles based on a combination of algebraic abstract data type theory, semiotics, and social theory. Major concepts include semiotic spaces to describe systems of related signs, semiotic morphisms to describe representations of signs, and preservation measures to describe the quality of representations. Some examples are given, each with a critical discussion, illustrating how semiotic morphisms can help with design.
Appropriate visualizations of complex data sets can be an enormous aid to scientists in discovering, verifying, and predicting significant patterns. Unfortunately, it has proven difficult to find general principles for producing appropriate visualizations. One reason is the lack of a precise definition for the word ``appropriate'' in the previous two sentences. The present state of HCI research does not provide an adequate basis for the design of visualizations. A few precise laws are known, but they have very limited scope (e.g., Fitt's law); there are many case studies, but their generality is unknown; and there are many methods, but reliability is uncertain (e.g., protocol analysis, usability studies, interviews). Meanwhile, both user communities and technology bases are expanding very rapidly, while the commercial sector continues to produce exagerated claims and mediocre products, and faith in experimental psychology and ergonomics as foundations is eroded by developments in CSCW (Computer Supported Cooperative Work) and related areas which demonstrate that many difficulties arise from taking inadequate account of the social context in which interfaces are actually used, and of the meaning behind the interfaces. In this sad situation, we badly need to explore new directions for the construction of general theories.
Many fundamental issues in information visualization can be understood in terms of representation: a visualization is a representation of some aspects of the underlying information, and major questions are what to represent, and how to represent it. An adequate theory of information visualization must take account not just of current display technology capabilities, but also of the structure of complex information such as scientific data, the capabilities and limitations of human perception and cognition, and the social context of work. For scientific visualization, the social context should include current scientific theories, conventional meanings of the signs and symbols used, the unequal importance of different patterns in the data, and the collaborative nature of scientific work. While it would be difficult to deny the importance of these factors for the design of visualizations and tools to support them, it would be foolish to believe that they are easy, and in particular, it would be foolish to believe that it is easy to get the designs of visualization or visualization tools right the first time, or that design can be fully automated. For this reason, both theories and tools need to be broad and flexible, supporting relatively painless reconfiguration and evolution.
Although it seems natural to try to use semiotics as the basis for a theory of representation, classical semiotics has unfortunately not developed in a sufficiently rigorous way for our needs, nor has it explicitly addressed the representation of complex signs; also, its approach to meaning has been naive in some crucial respects, especially in neglecting (though not entirely ignoring) the social basis and contextualization of meaning. So it is not surprising that semiotics has mainly been used in the humanities, where scholars can compensate for these weaknesses, rather than in engineering, where descriptions need to be much more explicit. Another deficiency of classical semiotics is its inability to address signs and their representations, as is necessary for interfaces that involve change, instead of presenting a fixed static structure, e.g., for standard interactive features like buttons and fill-in forms, as well as for more complex situations like animations and virtual worlds. We will suggest approaches to overcoming all these limitations.
Because we consider information visualization in particular, and user interface design in general, as problems in constructing appropriate representations, we need to know what representations are, and what makes them appropriate. For the first question, we consider a representation to be a mapping from one structured domain of signs, called a semiotic space or a sign system, to another such space. For the second question, we can measure the quality of a representation by how well it preserves what is most important to users, subject to any constraints imposed. These ideas might seem simple, but it is not so obvious how to make them precise. Here we use some algebraic methods developed for the theory of abstract data types [6]. More specifically, the structure of a sign system is given by an algebraic theory (consisting of a syntax declaration, similar to a context free grammar, and a set of equations) plus some specifically semiotic features, including hierarchical levels for signs, and priorities on constructors; more details are given in the next section, and full details appear in [2]. Dynamic interfaces can be handled by generalizing from classical algebra to a variant called hidden algebra, as discussed further in Section 2.4 below.
The success of this approach can be judged by the analyses and suggestions for improvement it provides for concrete examples, as in Section 3 below. While sensitive designers might reach similar conclusions, algebraic semiotics does so in a systematic way, based on general principles (in any case, the original designers of the examples in Section 3 did not reach these conclusions). The mathematical formulation of the theory also raises hope for partial automation of the design process. Finally, since all communication is mediated by signs, there is hope for applications well beyond information visualization.
An important insight due to Ferdinand de Saussure [10] is that signs always come in systems. A typical example considered by Saussure is the tense system for the verbs of a language. For example, in English, adding "ed" to the end of a present tense (regular) verb makes it past tense, and adding "will" in front makes it future tense, as in "walk", "walked", and "will walk". Saussure's emphasis on the structure of systems of signs rather than isolated signs has been very influential, for example, in French structuralism and post-structuralism.
A basic strategy for making complex combinations of signs easier to understand is to divide their potential parts into sorts, and then discover rules for the ways that each sort of part can be used. For example, newspapers are composed from articles, ads, cartoons, etc., while articles are composed from headlines, paragraphs, photos, diagrams, etc., and paragraphs are composed from sentences. The so-called parts of speech in traditional grammars are also sorts in this sense. Sorts may have a hierarchical structure under a subsort partial ordering. For example, the sort NOUN is a subsort of the sort NOUN-PHRASE.
The rules for composing signs into more complex signs are of two kinds, called constructors and axioms. Constructors are functions that build new signs from others signs of given sorts, plus perhaps additional parameters. For example, a computer graphics image of a cat may be given as a constructor, with parameters that determine its size, color, and location on the screen. There may also be functions and predicates defined on signs; for example, a LOCATION function for graphical objects, and a HIGHLIGHTED predicate for text. Axioms are logical formulae built from constructors, functions and predicates; they constrain the set of possible signs.
In many examples, some constructors for signs of a given sort are more important than others. For example, a warning popup window is more important than a virtual pet cat. This gives rise to a priority partial ordering on the constructors for each sort. For a different example, the pollutants in a lake may be prioritized by their toxicity, to aid in the design of an appropriate visualization.
Another fundamental strategy for managing complexity is to have a hierarchy of levels, with signs that are not atomic being constructed from other signs that are at lower (or possibly the same) levels. Thus linguistics has levels for phonology, morphology, lexicography, syntax, and discourse (i.e., multisentential units such as stories). Similarly, standard GUI displays have windows, which may contain other windows.
It is clear that context, including the physical setting of a given sign, can be at least as important for meaning as the sign itself. In an extreme example, the sentence "Yes" can mean almost anything, given the right context. This corresponds to an important insight of Peirce [8], that meaning is relational, not just denotational (i.e., functional); this is part of the point of his famous semiotic triangle. Using the ideas of this paper, we can consider constructors that place signs in context, by making them parts of larger signs. For example, the familiar 12 hour clock tells the correct 24 hour time in the context of external illumination, which can be considered an argument of a higher level constructor for clocks-in-context.
It is worth noting that neither semiotic theories nor semiotic morphisms describe relationships between signs and the realities (if any) that they represent; rather, it is the signs determined by the theories that can be taken to describe real situations. For example, a database schema might have fields for the age, condition, type, height, etc.\ of roses, but only a particular database can contain actual data about roses. Thus a semiotic theory determines a class of signs, which can potentially describe things in the world.
This paragraph contains some technical remarks for those who have the background and interest. A semiotic system S is a tuple (G, A, P, L), where G is a signature (or grammar) with a set N of sorts (or non-terminals) partially ordered by a subsort relation, A is a set of axioms, P is a priority ordering on constructors (which are in G), and L is a level ordering on sorts. Then the signs of S are the elements of an initial (i.e., "standard" or "intended") model of S, which is known to exist for many reasonable choices of a logic to use for G and A (for example, equational logics and for Horn clause logics have initial models, as do all "liberal institutions"). More mathematical details can be found in [2] .
Information visualization is an especially good source of illustrations for algebraic semiotics, due to two advantages that information visualizations have over arbitrary design problems. These are that the source space is concrete and given in advance, and that the target space consists of visual signs. The designer must be sensitive to features of the data to create a useful visualization, but certain structural features may not be obvious, and it may be even less obvious which of them are the most important. The process of considering a visualization as a semiotic morphism can focus the designer on such basic structural issues, and thus help in creating a good graphical representation.
Because semiotic systems are theories rather than models, semiotic morphisms must be translations from one theory to another, rather than translations from one concrete sign to another. This may seem indirect, but it has important advantages. First, these are theories of systems of signs, rather than of particular signs. In the case of information visualization, each model of the source theory is a possible dataset to be visualized, and each model of the target theory is a possible graphic representation. Dealing with theories forces the designer to more carefully consider the space of possibilities, instead of being seduced by idiosyncratic features of some particular data sets that happen to be available. Second, taking theories as our basis allows new structure to be added later, by expanding the theory in a consistent way.
In general there are many different semiotic morphisms between two given semiotic spaces, each determining a different way to represent signs. For example, in scientific visualization, a database may be presented as a text file, or displayed graphically in many different ways. Semiotic morphisms take structure in the source space to structure in the target space, mapping sorts to sorts, subsorts to subsorts, constructors to constructors, etc. But in many real world applications, not everything can be preserved, so these maps must be partial. Axioms should also be preserved - but again in practice, not all axioms are preserved. Design is the problem of massaging a source space, a target space, and a morphism, to achieve acceptable quality, subject to constraints. The extent to which different kinds of structure are in fact preserved gives a way to compare the quality of semiotic morphisms, as discussed further in the next subsection. Semiotic morphisms should of course also preserve content, but there are many examples where this too is partial; for example, relatively little content is preserved in representing a book by its table of contents.
This paragraph continues the technical remarks at the end of Section 2.1 for those for those with the background and interest. A semiotic morphism from S to S' consists of a partial theory morphism from (G, A) to (G', A') that partially preserves the priority and level orderings. Under certain reasonable conditions (e.g., if the logic in which theories are expressed is liberal), a semiotic morphism induces a (partial) homomorphism on the initial models, which maps the signs of (G, A) to signs of (G', A'). There is always a natural "forgetful" mapping in the reverse direction. More mathematical details can be found in [2] .
Lakoff, Johnson and others have developed the flourishing field of cognitive linguistics, building on previous careful studies of metaphor. Fauconnier and Turner introduced the notion of "blending," and demonstrated its importance for many aspects of cognition. See the blending website for much more information. Simple examples from natural language include "house boat," "road kill," "artificial life," and "computer virus," each of which is a blend of its two component words. It happens that "boat house" has a different meaning from "house boat" because a different blend is computed. This is not because the order of the words is different, but because the same two spaces can have many different blends [2]. Semiotic spaces significantly generalize the conceptual spaces used in cognitive linguistics, because they allow far more than just objects and binary relations. An appropriate generalization of blending is given in [2], covering many interesting examples in user interface design and information visualization. In this setting, a blend is built from two (or more) semiotic morphisms having a common source, called the generic space, with targets called the input spaces, by providing two (or more) semiotic morphisms from the input spaces to a blend space, subject to certain "optimality" conditions that rule out the uninteresting cases [2].
Hidden algebra extends the algebraic theory of abstract data types to handle states and dynamics, as well as concurrency and nondeterminism [5]. These are exactly the features needed to move algebraic semiotics from static signs to dynamic signs, for handling interactive interfaces, animated visualizations, virtual worlds, etc. Our approach requires that the cognitive and social dimensions of this extension should also be addressed. These can be explored using Gibson's notion of affordance, which he defined as "a capability for a specific kind of action, involving an animal and a part of its environment" [1]. For example, a BACK button on a browser provides an affordance for returning to the previously viewed page. Werner Kuhn has used semiotic morphisms, Gibsonian affordances, and blending to develop semantics for geographic information system interfaces[7].
Four examples are given in the following subsections, each with a discussion showing how semiotic morphisms can help with the design of information visualizations, including suggestions for improving displays.
Because a major intuition of semiotic morphisms is that they should preserve what is most important, it may be surprising that, if there is a conflict between structure and content (e.g., because not all the data can be displayed at once), it is more important to preserve structure than content. This is called Principle F/C in [2], and it is nicely illustrated in Figure 1, which is based on a code browser built at Bell Labs. The content of this display, which is the code of some program, has been sacrificed in favor of its structure, which is its division into files and procedures. Two spatial dimensions are used to represent this structure, while color is very effectively used to represent the age of the code. (The superimposed window on the bottom gives an overview of the whole program, plus a closeup showing some actual text. This illustrates the overview and zoom features of the system.)
Without knowing the use of this system, it is impossible to know how appropriate its representation really is. Still, we can infer from the display that the designer thought that the age of code was the most important attribute, presumably because of its value in debugging. However, such a tool would be even more useful if it could be configured to highlight with colors a variety of features of interest for a variety of problems; such features might include references to certain variables, certain uses of pointers, certain kinds of recursion, etc. (e.g., consider what might be needed to work on the Y2K problem).
Figure 2 illustrates FilmFinder, a system from Ben Shneiderman's group at the University of Maryland (see [11]) for displaying films, with the vertical axis indicating popularity, the horizontal axis indicating date, and the color indicating genre; the area on the right side is for controlling the system. We can see this display as the image under an appropriate semiotic morphism of a sign in a system of information about films, and we can infer what information the designer of this interface thought users would consider most important, namely the popularity, date, and genre of each film.
Treating this figure as a display of scientific data about the movie industry, we see that the density of films is significantly greater in the most recent years, except perhaps for those genres that are least popular; one can also notice other facts, such as that there has always been a higher percentage of drama, and that there are increasing percentages of action and horror.
However, this representation is not as useful as it could be. The problem is that too much content and not enough structure has been preserved. For example, it would seem better to aggregate all films having approximately the same attributes of interest into one blob, and then display the number of films in a blob using a distinct visual attribute, such as size or brightness. Successive blobs of the same kind could then be connected by lines having the same color as the blobs. Users could click on a blob to see what's in it, preferably displayed in a new popup window. These revisions could facilitate search.
Figure 3 depicts a later version of the same tool as in Figure 2, for the same domain of films (the SpotFire version of FilmFinder, from IVEE Development in Sweden); the main improvement is to give the user more control over what is displayed and how it is displayed. The particular display shown uses length and date for its two axes, and again uses color for genre, though the genre color coding scheme is not indicated; prize winning films are highlighted by having a larger size. Here we can observe a clustering at around 90 minutes length, and we can again observe that there are too many dots to be useful, even though this particular display cuts off at 1990! If the user is looking for a particular film or class of films, she will have to narrow the focus by imposing additional constraints, and this single display does not give us enough information to know how effectively that can be done. We may presume that the (possibly imaginary) user who created this display thought that these particular attributes were the most interesting at a certain point during a sequence of displays constituting a search; but in fact, they do not seem particularly useful.
We can also infer what the designer of this version thought would be most important, by examining the controls on the right of the display; we may hope that these were determined by polling an adequate pool of typical users, but the key issue should be how easy it is to use these controls in scenarios that have been found to be of particular importance. Presumably typical users are more likely to be looking for a good video to rent, than they are to be analyzing trends in the movie industry. So once again, the controls should reflect the key features involved in typical searches, rather than just the most important attributes of films in general. It would take some experimental work to determine what these key search relevant attributes might be. But we can still criticize the design of the control console, because of its exclusive focus on simple attributes instead of structure. And we can criticize the fine grain control given to users over length and year, suggesting instead that soft constraints would be more appropriate; it also seems doubtful that length is a highly significant attribute for search. We can also criticize its design philosophy, advocating instead a more socially oriented approach that relates the profile of one user to the profiles of other users to select films that similar users have found interesting (there are numerous variations on this theme, such as listing films that a user's friends have liked). Finally, we can note that the design ideas proposed to improve the previous version of this system still apply to this version.
Figure 4 sketches a semiotic space for a
file hierarchy, along with two semiotic morphisms, for visualizing it two
different ways in the graphical user interface of Apple's Macintosh OS 8.6.
The source space is a rational reconstruction of a specification for the
file system; its structure is that of an ordered labeled finite tree. When
Folder C is opened in the representation on the right, the location of file
Document.txt
is represented textually in the small area at the
top of its window, whereas in the left representation, its location has a
visual representation, based on position, including indentation. The left
visualization is better, because it shows more of the source space structure
in visual form, and also provides more browsing affordances in visual form.
However, more could be done in this direction.
Measuring quality by what is preserved and how it is preserved seems a novel idea, at least when formulated with the precision and generality suggested here. The principle that it is more important to preserve structure than content when a trade-off is forced, has surprised even some design professionals, although it is in the literature for many special cases, for example in the books of Edward Tufte, e.g., [12]. Another non-obvious result is that preserving high level sorts is more important than preserving priorities, when a trade-off is necessary. The need to take account of social issues in user interface design, e.g., in our discussion of Figure 3, is also surprising to many people; for this reason, our version of semiotics is not just algebraic but also social. This insight is not unique to algebraic semiotics; for example, the importance of social factors in HCI is the focus of its CSCW subfield.