This section of the class notes considers the display of data collections, such as in databases, in relation to the social groups that use them, and the application of semiotic morphisms to improve them.
The old view of databases envisions a single user with a well formed query about a well understood and well structured collection of data. Just as HCI has evolved from a technical ergonomic level through pyschology to a social collaborative level, so databases are evolving towards taking better account of the communities in which they are embedded, including their shared goals and their potential conflicts. The new view emphasizes helping users to help each other in various ways, and more generally considers the social side of data collection, dissemination, and use; this implies that database system design today is far from being a purely technical activity. A particular topic is sometimes called collaborative filtering, in which prior use of data helps to determine how it will be presented to current users. Also, increasing competition means that systems can much more easily fail from a lack of understanding the user community's structure and needs. Two good commercial examples of systems that make clever use of several forms of collaborative filtering are Amazon.com and Google, which does so much less visibly.
A similar expansion of horizons is happening in many other areas of computer science, as people come more and more to realize that systems exist and must function within a social context, and that they can draw on that context to improve system operation in various ways. Ackerman's reconceptualization of a help system as a collective memory system (described in Section 7.3) illustrates the kind of rethinking that is going on in many areas, e.g., in software engineering, there are generic architectures, modularization, libraries of reusable code, plug-ins, software patterns, etc.; it is also consistent with the evolution of HCI.
A rather dramatic example of a large distributed database system that raised significant social issues is Napster, which shared MP3 files over the internet. Over its short lifetime, this system caused severe disruption of several campus computer networks, drew huge lawsuits from the music industry, and was absorbed by a large media industry conglomerate. However, this is far from the end of the story, since many informal peer-to-peer databases are still operating, and the music industry is being very paranoid and agressive about them. Meanwhile, Apple's iPod and other non-free services are gaining ground, and a new business model appears to be emerging for the entire music industry, though it is not yet clear what that will turn out to be.
It should also not be forgotten that any successful system must evolve, because its users' needs (and many other things) will evolve; therefore it should be designed from the beginning to support evolution. And of course, iterative prototyping, user scenarios (more generally "use cases"), usability testing, interviews, etc. should be employed.
One instructive and important class of applications for semiotic morphisms is the construction of overviews, summaries, or visualizations for (possibly large) collections of data. Here the source sign system consists of data structured in some particular way; examples include books, source code for programs, digital libraries, websites, scientific data, and databases of all kinds.
Such interfaces often have a direct manipulation flavor. The kind of visualization done in scientific and engineering applications, such as aerodynamic flow over a wing, is a hot topic today; indeed, scientific visualization tools are part of a revolution in how science is being done, to such an extent that the very notion of scientific model is changing (e.g., see the recent book by Stephen Wolfran). Communication among and federation of databases is also becoming important, e.g., with the semantic web. It may sound a bit far out now, but virtual reality interfaces to large databases could well become important in the future.
On p.523 of his popular text, Shneiderman gives the following "mantra":
Overview first, zoom and filter, then details on demand.This might seem obvious, but as Shneiderman emphasizes, it is easy to forget; in fact, he repeats it 12 times in his book, once for each time he forgot it when he should have used it in some project. Although there are many situations where such a design can be used, there are also many situations where it does not apply. Please note that an overview is the image of a semiotic morphism from a source space of data, and that zooming, filtering, and selecting details are each manipulations of the semiotic morphism, modifying it to better approximate what the user wants; i.e., the slogan calls for designing not just a semiotic morphism, but a tool for defining semiotic morphisms; what this tool does is sometimes called filtering. Note that collaborative filtering can be considered the use of social processes to improve semiotic morphisms.
For the designer of a tool to support this kind of interactive construction of a visualization, the source space should be a theory of the semiotic morphisms that the tool supports, and morphisms from that source space will produce the sliders, menus, etc. with which users construct the visualization that that particular tool allows; thus there are two kinds of display, one for controlling the morphism, and one for displaying the result of that morphism on a particular dataset.
Some further related discussion is given in Information Visualization and Semiotic Morphisms, by Joseph Goguen and D. Fox Harrell, an informal introduction to semiotic morphisms applied to both analysis and design of information visualization, and see also The Ethics of Databases, a naturalistic study of the values embedded in web search engines.
An important point about this paper is signified by the numeral "2" in its title: the system described here is the result of an iterative design process, in which an earlier system, named Answer Garden, was subjected to a careful evaluation, based on the experience of actual users, and then the results of this evaluation were carefully analyzed, pinpointing certain weaknesses in some underlying assumptions, such as a sharp distinction between experts and ordinary users, leading to new assumptions and a new design based upon them. The result is a typical example of what ethnographers call situatedness, and what designers call site specificity: one learns important and interesting things by evaluating a system in the context of a particular community, but those things may not generalize to other communities. For example, distinguishing experts from users might be valid in a different context, and the particular escalation hierarchy that Ackerman designed for this community might well not work for a different community. On the other hand, software tools like those developed by Ackerman are valuable for building systems for other communities, and several of the more abstract ideas are more generally valid, especially that of colaborative filtering.
The paper addresses the important problem of integrating user communities with their computer systems. Such tasks are especially important for huge databases of badly understood, poorly structured data. Specific techniques used in AG2 include its answer escalation hierarchy, anonymization, an engine to find experts, a statistics collection service, and support for collaborative authoring. All these have potential applications in many other areas, though they are far from exhausting the range of useful modules; together they constitute a toolkit for "socializing" databases. These features were added by Ackerman in moving from his first version of Answer Garden to the second (AG2), which is based on a collection of modules that can be assembled in a variety of ways using Tcl/Tk.
The following semiotic method for synthesis is recommended:
Let's first consider books. If you look at the physical structure of a book, you will see that it has the most important information printed on its front cover and spine, and that inside it has pages; basic information on the cover and spine include title, author, edition number, and publisher; this information also appears on the title page inside, along with the date of publication (or possibly this is on the back of the title page - publishers may want to make the date harder to find, with the hope that users may not realize that a book has become old). Looking at the contents, you will see that chapters are (usually) the main structuring device, and that their main selectors give a chapter number, a title, and a page number; chapters are (usually) divided into sections, and possibly subsections, each of which also has a number, title, and page. I think these are the things that anyone would eventually discover, even if they were not already familiar with books. Taking them as constructors and selectors, with target medium a small number of printed pages, yields exactly the very familiar form known as an outline. Notice that the entire content of the book has been lost (unless you count titles as content).
Another example is source code. Surely the most important structuring device is the division into files (if there is more than one file). To see what attributes (i.e., selectors) are important, we can check what the unix ls command displays under various options. For example, ls -lat will display the files in a directory, with their name, owner, size, and date of last modification. If we preserve these and display them in a natural way in a color graphics window, taking account of human perceptual capabilities, we can get something much like the code browser built at Bell Labs , which displays file structure in blocks, file age using color, and file size using block size. Notice that the entire content of the program, i.e., all of its code, has been lost (however, it can be viewed with the zoom feature that this system also provides).
It is easy to find many other examples, such as the detailed analysis of scrollbars given in Semiotic Morphisms, Representations, and Blending for User Interface Design and briefly summarized in section 6 of the class notes. The conclusion is that algebraic semiotics is a powerful tool for designing and evaluating user interfaces in general, and interfaces to collections of data in particular (this phrasing is intended to include not just databases, but also file systems, digital libraries, etc.).
The following four principles summarize some significant contributions that algebraic semiotics can make to the design process:
Although Principle F/C is probably the most important, and at this time is certainly the most thoroughly studied and supported, there are three other principles that deserve attention, although the range of their applicability has not yet been carefully examined: Principle HL/LL says it is more important to preserve higher levels than lower levels; Principle HL/C says it is more important to preserve high levels than content; Principle P/C says it is more important to preserve priorities than content.
The following outlines a recommended semiotic method for analysis of an interface; it resembles what is done in contemporary semiotic analyses in the humanities, but is both more limited and much more precise. Note that essentially anything can be regarded as an "interface" for the purposes of this method.