Roberta Mancini, Stefano Levialdi
At the opening Plenary of CHI'96, Herbert Clark challenged human-computer interface design to emulate some of the graceful repair found in face-to-face conversation. However, the dominant paradigm in recent user-interface design has been one of action, not communication - direct manipulation, not commands. In day-to-day life we find the transition between the worlds of action and communication problematic, so it is not surprising that we experience similar problems in the computer world. Nowhere is this transition more marked than when using undo - we are forced to think about what we have just done - breakdown.
Undo, history, direct manipulation, breakdown
At the opening Plenary of CHI'96, Herbert Clark challenged human-computer interface design to emulate some of the graceful repair found in face-to-face conversation . The running example was the interaction at a supermarket checkout. This is a very rich scenario requiring direct communication and a common shared understanding of the situation. It also highlights the importance of physical objects (the goods, money, credit card, receipt) as part of the grounding process, an emphasis shared by others, especially in the CSCW community .
As linguistic animals, we have developed many ways to recover from misunderstandings, misconceptions and slips of the tongue and establish a shared understanding . For example, we often implicitly restate our understanding of a previous utterance as part of our reply. These conversational repair mechanisms are usually unconscious and do not distract us from our primary object of communication.
In contrast, human-computer interaction seems clumsy. By way of example, Clark noted that the way to obtain help about an object in the middle of the screen typically involves clicking on a menu-bar item at the top of the screen, and then getting information back in a different part of the screen, quite likely obscuring the object you were first interested in! But, is this a fair comparison?
Let's take a slightly different scenario. You are still at a supermarket, but this time imagine you are at the shelves, looking for a can of baked beans, but cannot find the size you normally buy. So, you look around and see someone who looks like a member of staff. First you check whether he is wearing a store name badge - yes. Then you approach him "excuse me", then a little louder, "excuse me, I'm after a small beans, not the very small cans, the sort of middle sized ones, but I don't want the expensive kind ..."
While standing at the shelves, you are in the world of action, you are doing things. Although you may have a shopping list, much of your interaction is not made explicit and is certainly not articulated. However, to ask for help from a third party means you have to move from action to communication. Instead of acting in the world, you are talking about it. This is difficult and very often clumsy.
If we look at computer systems, the dominant interface paradigm, direct manipulation, encourages an action-based interaction. It is no wonder that when we seek help in such systems the interaction we experience seems clumsy also.
Possibly while you stand bewildered at the shelves someone approaches: "can I help ...". Some agent-based systems offer similar advice, but like the human helper, they must tread a fine line between helpfulness and intrusiveness. Even when help is required, the agent and user must still articulate the situation to one another, again being forced to talk about what is being done, rather than doing it.
It is important that this is not a criticism of current interface design, but instead an intrinsic problem of interaction. In the real world things can be done to ease this transition: a store uniform to make it easy to see who to ask for help, clear labelling of goods to make talking about them clear. One of the most important aspects is deixis, the ability to refer to and point to objects in the world of action during communication. Indeed, one of the authors has previously nominated bar-coding as the most successful CSCW technology because it provides a globally recognised deictic reference . In the computer world we have more flexibility: our helpers can be ever available and infinitely patient! However, despite all this, it is clear that thousands of years of communication has not made us as graceful at moving between action and communication as we are within either world.
Neither should we use these difficulties as an excuse for clumsy interaction within the communication paradigm as highlighted by one of Clark's other examples of help for a dialogue box. The dialogue box is firmly within the world of communication (as its name suggests), giving information linguistically and expecting words (albeit the restricted vocabulary of button labels) in reply. This is certainly a place where the lessons learnt from human conversational interaction should be directly applied.
Why is this transition so difficult? Partly because action and communication typically employ different senses and temporal and spatial skills. Action is focused on the current state and goal state and primarily based on visual and haptic senses. We see or feel the current state of something and think in a situated rather than pre-planned fashion . In contrast, communication is naturally aural (albeit now recorded on paper and screen) and is temporal, both in that it occurs in a sequential manner and in that it is used to describe sequences of events (narrative) and to pre-plan actions. The transition is difficult enough in the real world where different senses and articulatory organs are used. Without care computer systems can be more confusing as the same devices and the same visual display are used for both action and communication.
|example||world||senses|| reasoning |
|shelves||action||visual/haptic|| situated |
|checkout||communication||aural|| temporal |
Table 1. comparing the worlds
In some ways undo is one of the more formally tractable interface areas as the properties we expect of undo (that it reverses the effect of previous actions) is easy to capture algebraically. It has been the subject of several different formal treatments, both in the single-user case and more recently focusing on multi-user and selective undo [6, 7].
The authors have been revisiting the case of single-user undo, but formalising the general notion of an undo system rather than modelling a specific mechanism. This has lead to some informal insights on the nature of undo , but also posed some formal challenges. One of the principal complications on the formal side has been the reflexive nature of undo. That is, an undo system is looking in on the process of interaction, recording some sort of history of user actions or system states. However, undo mechanisms are themselves invoked by user actions and the history they store is itself part of the system state. These are interesting yet tractable formal problems, but do not just pose a challenge for formalisation.
Explicit undo is by its very nature a breakdown situation. You are in the middle of doing something when you notice something has not worked as intended. You may correct this by doing something else which either reverses the unintended effect (implicit undo) or moves closer to the intended goal (forward error recovery ). However, if you instead reach for the undo button, you are saying "my last action was wrong" - that is, you are thinking about what you are doing, instead of doing it - breakdown. For more complex undo mechanisms this is explicit in the interface. Consider the case of Microsoft Word 6 with 100 levels of undo. You are typing, clicking icons, moving things around. Something is wrong, you try to correct it, make a mess and so reach for the undo menu - you are faced with a list: "typing", "paste" etc. Unlike an old command-line system you were not interacting with such words, but in order to talk about them you must move from the realm of direct manipulation and action into the world of words and communication.
Help and undo are not the only places where such transitions are found. Any form of history or recovery mechanism exhibits similar problems. Hypertext systems such as HyperCard or the Netscape browser have history lists which record where you have been. These are known to cause problems for both users and for formalisation (as last year's CHI'96 workshop on formal methods discovered!). One of the authors is involved in a project on version management and CSCW where again users must talk about documents. Indeed, any system to support audit information or to promote awareness of the reasons for changes requires a dual emphasis on current action and past history. Finally, many agent-based systems require that the user either formulates or assents to pre-planned behaviours of the agents. The results of these behaviours will be actions, but the formulation is typically linguistic.
We need to recognise the differences between communication and action which are often partly elided in computer based systems. We have different skills at each, but the transition between the two is difficult in real life and will inevitably be so in the computer world. Direct manipulation emphasises action, but linguistic interaction is also necessary both when we review our actions, as in the case of undo and history mechanisms, and when we externalise our plans, as in the case of some agent-based systems. This does not mean that there is nothing we can do to make things better, but does mean that we should expect problems and concentrate creative design effort on these points of transition.