CSE 271 Class Notes 10 CSE 271: User Interface Design: Social and Technical Issues
11. Multimedia, Blending and Humor

This section first gives general definitions for media, multimedia and related concepts, and then turns to blending and its application to humor.

11.1 Media and the Ubiquity of Interfaces.

User interface issues are everywhere. A coffee cup is an interface between the coffee and the user; questions like how thick the cup should be, what its volume should be, and whether it should have a handle, are all user interface issues. A book can be considered a user interface to its content; note that a book is interactive, because users turn the pages, and can go to any page they want; they also use indices, glossaries, etc in an interactive manner. Buildings can be seen as providing interfaces for getting to a certain room, e.g. by using a directory in the lobby, buttons outside and inside elevators, "EXIT" signs, doorknobs, stairways, and even corridors (you make choices with your body - not your mouse). Returning to the obvious, medical instruments have user interfaces (for doctors, nurses, and even patients) that can have extreme consequences if badly designed. By perhaps stretching your mind a bit, almost anything can be seen as a user interface, having its own issues of design and representation. Certainly this is how Andersen views his museum.

Of course, all this is quite parallel to what semiotics says about signs, and indeed such issues can be considered a part of semiotics, although the notion of semiotic morphism is often needed in making the translation. The basic idea is to consider an object, such as a cup or a building, as a composite, structured sign.

Of course, all this is quite parallel to what semiotics says about signs, and indeed such issues can be considered a part of semiotics. The basic idea is to consider an object, such as a cup or a building, as a composite, structured sign. The guidelines for semiotic analysis in Section 7.4 of the class notes apply perfectly well to these less computational examples: You can look at a number of different coffee (and tea) cups, and use their systematic differences and similarities to interpolate the source space and morphisms, and then use that to expose the values of various classes of cup users. You can do the same with buildings. In fact, my claim is that this method applies to anything that humans build and/or use, and that there is no firm distinction between web design and general design.

Here are some further concepts that are useful:

Andersen's museum is a multimedia interactive system (and so is any other museum, though in a more prosaic sense). I would emphasize that the notion of genre is social, whereas that of medium is technical. In discussing real examples, it is difficult (or impossible) to separate media from genres, i.e., to separate the technical from the social, in particular, because a medium without conventions for its use will be difficult or impossible to use.

I would also mention that every genre embodies values and ethics. For example, detective novels typically reinforce the values of truth and justice, and more generally, by its very nature, a genre emphasizes certain things at the expense of others, i.e., it expresses values.

Some potential problems with video include: slow session startup and exit; the difficulty of identifying speakers; difficulty of making eye contact; changed social status; small image size; and potential invasion of privacy. Having a good audio channel is important, because experimental studies show that oral narrative provides the context within which video is interpreted; you can also verify this by observing how sound tracks are used in movies. (One simple experiment is to put different sound tracks on the same scene, so that it is interpreted in completely different, even opposite, ways.)

Results about the structure of interaction from the area of ethnomethodology known as conversation analysis (as discussed in Techniques for Requirements Elicitation) have interesting applications to video conferencing and similar media. Concepts of particular interest include negotiation for turn taking, interrupts, repair, and discourse unit boundaries. One important point is that a long enough delay (perhaps as little as .1 second) can cause large disruptions, due to our expectations about the meaning of gaps in coversation. Another point is that separate video images of individuals or groups, especially when there are many of them, can frustrate our expectations about co-presence, such as our expectation that we and other participants have the ability to monitor attention, and to conduct effective recipient design. Please note that all these concepts from ethnomethodology concern the natural interaction of human beings; when we consider them in the context of human computer interaction, we are only using an analogy, since these sociological concepts do not apply to non-humans.

11.2 Notes on Andersen's Multimedia Phase-spaces

This paper discusses a very innovative approach to designing multimedia systems, based on concepts from the area called dynamical systems theory. The mathematics is more or less along the same lines found in physics and mechanical engineering, but some details are different, and the applications are completely, and intriguingly, different: concepts like phase space, potential field, gradient, attractor and chaos are being used to tell a story, and to convey values and information. In fact, dynamical systems concepts are on the cutting edge of science and technology in several important areas, one of which is sensors: it turns out that adding a little noise of the right kind can actually make a sensor more sensitive, by perching it on "the edge of chaos" (this is a technical term). Andersen's approach has several significant benefits, one of the most important of which is avoiding pre-programmed linear sequences, such as are found in nearly all current authored products.

I would not promise you that multimedia user interface designers of the future will be using dynamical systems theory, but I do feel confident that interactive multimedia systems, roughly along the lines of Andersen's Viking Museum, will be important in the future; I would guess that there will be home players, in the form of VR rooms, for "playing" interactive multimedia "texts", probably downloaded over the internet, where users can experience many different things, like today's "home theatres" but much more flexible and interesting, perhaps with smell, motion and haptic feedback, in addition to sound and sight. Perhaps some future designers of programs for such devices will be media superstars, like Michael Jackson and Madonna today.

More technically, we can distinguish four levels of description for Andersen's system. The hardware level is at the bottom, with lighting, slide projectors, speakers, amplifiers, and the large video interface (the "Eye of Wodan") with its input devices (which seem to be a mouse and maybe some buttons). Next there is a software level, basically an object oriented program, using C++, standard Apple multimedia applications, and custom code generators, or slightly more technically, an event oriented program with some slightly exotic device drivers. The third level is that of dynamical systems, where we see potential fields over the phase space changing over time, moving the point that describes the state of the room. The fourth level is the most abstract and most interesting, because it contains the most human elements, namely narratives, conflicts, values, and of course information about old Viking life.

The conflicts are important for making the experience interesting to users; as Aristotle said more than two thousand years ago, "drama is conflict." This is one of the most fundamental facets of Western culture; you can see it on TV (ads, sitcoms, "American Idol," even the news), in movies, newspapers, magazines, etc., etc. Not all cultures have this same value system; for example, classical Balinese narratives get their "kick" from a return to their starting point, as can be clearly heard in the cyclic nature of classical Balinese music, e.g., for classical shadow plays. Andersen's ways of using phase space dynamics to bring out conflicts in interactive multimedia systems is (in my opinion) brilliant; see his paper for several interesting examples. Values are sometimes conveyed in an interestingly implicit manner. For example, the fact that the Vikings valued adventurousness is conveyed by rewarding users for being adventurous, e.g., giving them new output, which might be bird sounds, story fragments, bits of information, pictures of artifacts, etc.

The programming style is not especially innovative; in fact, it fits a familiar genre of object oriented programming called event oriented programming (or sometimes, event driven programming); but it seems that Andersen and his team were not familiar with this literature. There are also some interesting connections with semiotics that will be discussed later. What I would especially highlight about Andersen's approach is that the story lines are not preprogrammed, but arise from the activation of events when their potential energy gets high enough, through a combination of author programming and user interaction with the system. In fact, it is quite possible for entirely unexpected conjunctions and sequences to occur, some of which might be very interesting and appropriate, others less so. A very nice metaphor for talking about this is through the satisfaction of elastic constraints, which can be "pushed against" with greater and greater effort as they become stronger, and eventually may become strict contraints, but meanwhile, allow various amounts of freedom of choice.

It would have helped a lot, I think, if Andersen had included the following equation in his paper:

       v(t+1) = v(t) + a + u

where v is is the point (vector) in phase space, t is the time, a is the increment provided by the author, and u is the increment provided by the user, noting that both these increments are also computed at time t, and that their values depend on the current state of the system.

In the final section, there are what appear to be some excuses, from which one might conclude that the museum was not entirely a success from the point of view of those who paid for it and those who visit it. An "educated guess" says that some users may be confused when they walk in and see that nothing much is happening, and if they are (say) a bit shy about technology, they may not interact with the system enough to get it to do anything, and so will fail to learn anything about Vikings from their visit, and therefore be disappointed, perhaps even angry; the affordances are not perceived affordances. Exercise: Suggest a social solution to this (possible) problem.

It is interesting to notice that many video games employ similar techniques, though their designers do not use the same sophisticated terminology as Andersen.

11.3 Cognitive Linguistics

Nearly all work on linguistics is concerned with grammar, and insofar as meaning is considered at all, it is usually literal meaning that is treated. In fact, there has not been a lot of progress in grammar during the approximately 3,000 years since Panini's classical grammar for Sanskirt was written; Panini's grammatical formalism is very similar to Chomsky's. However an important revolution is now occurring, in which many of the more human - and I would say more important - aspects of language are being explored, with fascinating new results and important new applications. Of course grammar is still important, but because researchers in other fields could only make rather limited use of grammar for their applications, they are eagerly adopting the new paradigms, even though the profession of linguistics has been rather slow to respond to this challenge.

Among these new topics, the following seem particularly relevant to this course: metaphors and blending (as discussed below) in the field called cognitive linguistics; the structure and analysis of multi-sentence units in the field called discourse analysis; speech act theory; conversation analysis (in the sense of ethnomethodology), which we have already discussed; and of course semiotics, which today is a dominant theoretical language in studies of film, literature, and media in many academic departments - indeed, semiotics has been called the "mathematics of the humanities" by Peter Bøgh Andersen.

Let's discuss metaphors first, following some brilliant work by George Lakoff, a linguist at UC Berkeley (and by way of full disclosure, I should also say that he is an old and close friend of mine). The usual idea of metaphor is that we speak of one thing in terms of another, often using the words "like" or "as". For example, someone might say

Word is like a maze. There are so many choices, and it is very easy to get lost. Also sometimes I can't figure out how to backtrack and undo a choice.
Once the basic scheme has been set up with the first sentence, new material can be added that will be interpreted in the same framework, thus enriching our understanding of the speaker's experience, as we constantly refer back to what we already know about mazes.

It is easy to see examples like this in terms of semiotic morphisms. Here the source sign system is for mazes and the target sign system is for Word. Of course, we do this in a way that is only semi-formal, since no one in their right mind would want to write a complete formal sign system for Word! On the other hand, it is easy to give a completely formal sign system for the structural aspect of mazes, as directed graphs with a given start and finish node; so there are sorts for nodes and edges, a constructor that attaches directed edges to nodes, and constants for the start and finish (i.e., goal) nodes. We then see that the start node of the maze maps nicely to the START icon (or some other way to invoke Word) in the lower left corner of the Windows display, and that choices of edges in the maze map to choices of menu items (or keys on the keyboard) in Word. It now follows that paths in a maze map to sequences of actions in Word. All this is completely natural, and readers of the above quote are able to make such connections in mere milliseconds, of course without doing any of the mathematics that we are sketching here; as a result, they can easily understand the use of maze language in further talk about Word. (But see below for discussion of connotation, etc.)

The duality between sign systems, which provide languages for taking about signs of a certain kind, and their models, shows up in an interesting way in this example. A model for the Word sign system would include traces of particular tasks, such as writing a short business letter that has some bold face characters in it. The goal is then to print the letter, and this goal lies at the end of a long path through a maze of menu choices, mouse movements (including mouse buttons), and keyboard strokes. Our semiotic morphism maps this path, which begins at the Windows START icon, to a much more abstract path through a graph of nodes and edges whose significance in terms of documents has been lost. That is, a semiotic morphism maps the language of its source sign system into the language of the target sign system, and as a result, maps models of the target sign system into models of the source sign system; it is typical that some information is lost under the mapping of models.

It is also interesting and important to notice that there is much more going on here than these simple mathematical transformations. Mazes have connotations as well as an abstract mathematical description. Scholars will know that the original "maze" was an actual physical structure on the island of Crete in ancient times, with a dangerous beast in it, called the Minotaur; in this maze, if you got lost, you might also get killed! And today, even non-scholars know that mazes have an associated feeling-tone that is rather bad, unpleasant, and perhaps even dangerous. For this reason, the above quotation is also a rhetorical gesture, having the effect, which is not explicitly stated, of placing a negative connotation on Word. In fact, imparting connotations is often the real purpose of using a metaphor, and the word "rhetoric" refers to this aspect.

It should not be thought that such connotations lie outside of the semiotic framework that we have been developing. For the semiotic space (called a conceptual space in the cognitive linguistics literature) of mazes can be much richer than the simple graph sign system discussed above, and in particular, it can "recruit" the Minotaur, and anything else that is generally known about mazes in our culture. For example, the above quotation can easily be extended with the following sentences:

For me, the weird INSERT menu is the Minotaur lurking in the maze of Word. The whole thing has been a very painful experience. I thought I would die.
Since the negative emotional connotation is part of the conceptual space of mazes, it is therefore automatically available to be carried over into talk about Word. This is easily formalized by adding some simple relations to the source sign system.

However, it is not really typical that an extended metaphorical discourse involves just once source sign system; very often there are two, or even more. For example, the word "weird" in the above quote hints at some kind of occult influence, and this hint could easily be expanded and incorporated into the story, for example, as follows:

Perhaps a voodoo doll of Bill Gates would have saved me, or at least given me some satisfaction.
To understand this kind of language, we need to include another metaphor and another space, for "occult" entities. In fact, this, and even the original quotes, are better understood in terms of blending the space of mazes with that of Word. For example, the sentence "Also sometimes I can't figure out how to backtrack and undo a choice" in the first quotation uses the word "undo," which comes from the computer world as well as the word "choice" from the maze world and the word "backtrack" which could be from either. Moreover, the story has constructed several hybrid entities, including Word-as-maze, INSERT-as-Minotaur, and Gates-as-doll, which do not belong in either the maze space or the Word space. So where do they belong?

The blending theory of Gilles Fauconnier and Mark Turner provides an answer: there is a blend space that contains the hybrid entities mentioned above; the mathematical theory of graphs is included in a generic space, consisting of those things shared by the input spaces, which here are for mazes, Word, and the occult. Then the mapping from one input space to another in the Lakoff theory arises as a side-effect of blending, just by seeing which entities from the input spaces get identified in the blend space. Note that these conceptual spaces are not all inclusive "knowledge domains," but are considered to contain just the minimal information needed to understand the situation at hand; however, they are also dynamic, in that they can grow as new language recruits new conceptual content.

I hope that all this will encourage you to carefully read the material on blending on pages 18-22 of An Introduction to Algebraic Semiotics, with Applications to User Interface Design, where some other applications are discussed, including finding the meanings of compound words such as "boathouse"; see also the Formal Notation for Conceptual Blending.

11.4 Humor

An important preliminary observation is that blends are not unique, as is well illustrated by blended words like "boathouse" and "houseboat," as discussed on pages 18-22 of An Introduction to Algebraic Semiotics, with Applications to User Interface Design; see also the discussion in Formal Notation for Conceptual Blending; this means that the combination of the two words can be highly ambiguous. Oxymorons are discussed on the top of page 22 of An Introduction to Algebraic Semiotics, with Applications to User Interface Design; see also the oxymoron page. The ambiguity of blending plays an important role here, since an oxymoron has two different blends of two given words, one of which has a standard meaning, and the other of which has some kind of conflict or incongruity in it. Often the second meaning only arises because the word "oxymoron" has been introduced, and this deliberate creation of a surprising ambiguity is what makes these a form of humor. For example, in "military intelligence" the standard meaning is an agency that gathers intelligence (i.e., information, especially secret information) for military purposes, while the second, conflictual meaning is something like "stupid smartness," playing off the common (but incorrect) prejudice that the military are stupid, plus the more usual meaning of intelligence.

Many newspaper cartoons, consisting of 1 to 4 small scenes (i.e., panels), achieve their effect by setting up some situation, and then recontextualizing it, i.e., introducing new elements and relations into the conceptual space, which have the effect of forcing a new organization for some parts of the conceptual space that was originally set up. Let's call this reblending. The first space is in general itself some kind of blend, and its reconceptualization is also a blend, of the old space with the new material; this often has a humorous effect. An informal survey of cartoons in the local newspaper found that more than half of the intendedly humorous cartoons achieved their effect by reblending a given conceptual space with some new material, to give some parts of the old one surprising new meanings. A lot of humor seems to have a similar character. Here is a link to a simple visual humor example that involves reblending but is not an oxymoron. Oxymorons can also be seen as involving a "cross space" mapping which imports at least some of the (generally less conventional) contradictory meaning into the (generally more conventional) blend; but the "two blend" approach developed above is more fundamental.

Similar phenomena can be found in puns, as well as in music, poetry, and probably in every art form, where the effect is by no means always comic. It seems clear that evolution has provided us with positive feedback for improvements in understanding in the form of mental pleasure, since this has an obvious survival value. One familiar example is the so called "Eureka" experience, when we suddenly see the solution to some problem that we have been pondering for a long time.

In light of our success with oxymorons, cartoons, etc, it seems reasonable to believe that some fairly large areas of humor can be characterized in terms of reblending. Once we have this understanding, we are in a position to apply it to design. For example, we might want to make the use of certain difficult interfaces a lighter and more pleasant task by introducing some humor. But it is important to notice that, because the psychological impact of recontextualization depends on its novelty, repeating the same joke again and again will not be effective interface design, and in fact, will prove irritating to users; this implies that humor must be introduced very carefully and selectively. Many designs have had to be redone for this or closely related reasons. For example, overly cute icons or avatars can quickly become irritating (or slowly, if they are a bit less cute). Two instances from Microsoft include the barking dog in PowerPoint and the obsequious paper clip in Word (this has been made a non-default option in recent releases). Note that cuteness as a semiotic phenomenon is closely related to humor, in that it involves blending partially conflicting systems, such as child-like facial features, with a serious message.

This discussion of humor differs from the current cognitive linguistics literature in its emphasis on re-blending. For example, Seana Coulson's Extenporaneous blending: Conceptual integration in humorous discourse from talk radio discusses humor in terms of blending material from "apparently unrelated" domains, which is called "double scope blending" in The Way We Think, by Fauconnier and Turner.

11.5 A Note on Andersen's Dynamic Logic

This is an extremely interesting piece, although parts are rather difficult, so I suggest that if you read it, you can skip anything that doesn't make sense to you. We may go over some of the material in class. There are many interesting ideas here that could be explored in a project for the class. Among other things, this paper discusses the logic of multiple contradictions in narrative (building on Greimas), the relation between static and dynamic analyses of narrative, modal logic, fuzzy logic, and more, all in the dynamical systems paradigm of the paper we read earlier, Multimedia Phase-spaces. If you get interested in this, you should also look at my Notes on Gradient Logic.

To CSE 271 homepage
To the next section of the class notes
To the previous section of the class notes
Maintained by Joseph Goguen
© 2000 - 2005 Joseph Goguen, all rights reserved.
Last modified: Sun Jun 12 12:48:55 PDT 2005