Position paper for FADIVA workshop
|At time of publication:|| HCI Group and Dept. of Computer Science, |
University of York
See also: Alan's pages about Time and about Visualisation
Coming to the FADIVA group from the outside many of the things below may already be well known, apologies for these, but I hope some things suggest new perspectives.
After Einstein's theory of special relativity time and space can never again be regarded as separate concepts. Instead we now see them as different views of a single time-space continuum. In fact, one could argue that this is not so much a revelation of the 20th century, but a re-evaluation of the supposed objective nature of time born out of the age of clocks and the ensuing mechanistic models of the universe. In day-to-day life we continually experience the coupling of time and space as we travel or send letters.
In preparing my talk for AVI this was brought home to me as I considered the different world views engendered by different senses which each take a different 'cut' through the time-space continuum: vision - spatial, smell - temporal and sonar a mixture between the two.
In measuring the real world we often cannot get a 'snapshot' of time and space simultaneously (in fact the very word snapshot suggests measurement at only one time!). In my first job I worked on the mathematical modelling of agricultural sprays. One of the aspects of our work involved measuring the sizes of water droplets produced by different kinds of spray nozzle. The results obtained by our group were consistently different from those produced by another group. It turned out that their equipment obtained a spatial sample of the droplets in a given volume whereas ours used a temporal sample of the droplets passing through a surface. Small droplets slow down and hence they measured more small drops than we did.
Similar issues arise in measuring the statistical properties of air movements: one can obtain equivalent results by using simultaneous readings from several instruments at different locations or by looking at a temporal record of readings at a single location. Indeed, my desktop scanner works by moving a scan head under the document being scanned - it builds a two-dimensional image from a time series of one-dimensional scans.
As the traditional medium of communication (paper) is static and two-dimensional, we are used to seeing representations of time mapped into space. In comic books and also technical manuals we see sequences of images laid out giving an idea of temporal progression. Single comic book images may use various forms of blurring, streamlines or other ways of giving an impression of movement (even multiple images of the same object in the same frame). These visual cues to movement are increasingly being recognised by the computer graphics and HCI communities. In the scientific community the most prevalent example of embedding time into space is the graph where time is mapped directly onto one of the spatial dimensions. It is important to note that although this representation is to some extent a technological artefact of the nature of paper, it also serves an important perceptual role, it is easier to perceive trends in a spatial representation of the data than if the same data were animated (with no graphical trace).
In a dynamic visualisation we can use time itself as part of the representation.
We'll return to interactive visualisation later.
Forgetting time for the moment (!), let's think about space. Ignoring superstring theory, we live in three-dimensional approximately Euclidean world. Many of the exciting visualisation techniques seen over recent years take advantage of this and use 3D visualisations to increase attractiveness and (debatably) utility. Of course, when we say 3D in this context we really mean (what is conventionally called) 2 1/2 D. Occlusion means that we can at best see one thing in any direction and only the surface of things. Sight is literally a superficial sense.
Why can we only see in 2.5 D? Let's unpack the answer. All we really see from an individual eye is 2D. Each eye gives us (in low-level terms) a mapping from positions in a 2D space to some attributes (colour, intensity, perhaps texture). That is each eye gives us:
D x D -> A.With stereoscopic vision and other depth cues, we can do a bit better and get an estimate of the distance of any object in any direction. That is we can see:
D x D -> (A x D)Notice however, that one of 'D's sits on the 'wrong' side of the function arrow!
In the real world there is something (or perhaps nothing) at every point of space. That is at every point in a 3D space there are some physical attributes (let's say P). A reductionist view of the world is therefore
D x D x D -> P.The problem of vision is that this world must be mapped onto
D x D -> (A x D)They don't fit! The fact that one of the 'D's of vision is on the wrong side means that we can see at most one thing in each direction. In the physical world this is the closest object. In a computer visualisation this could be objects at a fixed distance, objects with certain attributes, or even the furthest objects. However, all will be 2.5D in one way or another. In fact the extra 1/2 D is so minimal it might be more accurate to regard vision as really being 2.000001 D!
Some of the most successful 3D visualisation tools have been various forms of molecular models. These are composed of lots of point objects, so the chances of having more than one thing in the same direction is small and hence, for this case 2.5D is effectively 3D. VR techniques can win us an extra bit of dimension by allowing us to look around objects, see what is behind and even perhaps go inside buildings etc. However, even this only allows us to see the surfaces of objects, not full 3D vector fields, such as internal temperatures, fluid densities or flows. VR is perhaps 2.000002D. In fact, one way in which flows are shown is by using tracers which give you a sparse sample of the full 3D field. Because they are sparse, like molecules, they are 'open' enough to see inside.
Time can be used to give the full extra dimension. This is precisely what is happening in the videos of the digital human body mentioned previously. In this case the data is of the form:
L x L x L -> A'(using L for the length dimension and T for time as in traditional dimensional analysis in physics) and the screen is of the form:
L x L -> AHowever, we view the screen through time leading to a view of the form:
L x L x T -> AWhich can map directly onto the dimensions of the data. As well as half (or 0.000001) spatial dimensions, one can also get partial time dimensions in visual representations of time. This is precisely the case when we look at footsteps in the sand. Footsteps can occur anywhere on the sand, but there can only be one footstep at any place (further footsteps obliterate what is below). Furthermore, on a breezy day older footsteps are partly blown away, so by the sharpness of a footstep we can tell how old it is. The view we get is therefore of the form:
L x L -> (A x T)The real history of the beach is that at various times people trod in different places. That is, the real footstep history is of the form:
L x L x T -> AJust as with normal vision this just doesn't fit. There is just too little of the time dimension and all we see is the most recent footstep. Just as we normally see only the closest object. Footsteps in the sand are a 2L+1/2T visualisation!
The temporal dimension can be very important in bringing together different aspects of data - that is data fusion.
It is probably (ii) which immediately springs to mind when one considers time, but it is not necessarily the most important or prevalent.
Note that (iii) is often used in existing audio-visual teaching material, especially where one part of the display represents an animation of a physical system and the other a graph of its temporal behaviour. Also (i) is used semi-statically with transparency/tracing paper overlays. Temporary ghosting in temporal displays may aid feature detection.
All of the techniques for using time can be used in delivered media such as television or videos, whether produced by computers or more traditional animation techniques. In such pseudo-static (static in the sense that any dynamic aspects are fixed into the product) one may be able to interact with the media itself (turn the pages of a book, or operate the controls on a VCR), but not with the data being represented or the style of presentation. The real gain in using computer visualisation is the ability to interact through the media, acting on the data themselves and also with the parameters of the representation of the data.
In fact interaction is central to our visual system itself. Many of the hard cases for computer vision are those which are in some way boundary cases. If one is allowed to change the viewing angle only slightly the ambiguity is often resolved. Similarly, the strange camera angles used in the 'can you guess what this is' photographs are only confusing because we cannot move backwards and see the context of the photograph.
In fact, even our 2.5D vision is built only partly upon stereoscopic cues. We use some other static cues such as the colour and clarity of images (early Smalltalk systems half-toned the inactive windows, perhaps on colour displays inactive windows ought to have reduced contrast and be transformed to the blue end of the spectrum?). However, we also rely strongly on parallax effects from our own movement to determine distance. Furthermore, we don't simply look at things, but examine them, look behind them, open them, walk inside them. To the extent that we sense a 3D world it is not that we simply 'see' it, but that by interacting with it we experience it.
This is of course the case in the electronic world also. VR systems are only really immersive when they are interactive. Even fixed animations are often most useful if one has VCR-style controls to move backwards and forwards through the images. Also, compare a fixed video of the digital human body, with an interface which allows you to select which direction and cross-section to view using sliders. It is like the difference between a medical student looking at cut-away pictures in an anatomy textbook and actually dissecting a cadaver, or like an archaeologist looking at a site plan compared to actually scrapping away around the artefacts during the excavation. Of course the great thing about the electronic world is that one doesn't just get to do this once, but one can explore into an object, then 'put the bits back' and start again form a different viewpoint.
In addition, interacting can allow us to trace a path in a multi-dimensional space by simultaneously varying several parameters, perhaps by using several limbs simultaneously as Bill Buxton has always advocated. I have often wondered how well we can comprehend such higher dimensional spaces (some mathematicians can do four dimensional geometry in their heads).
Using time can allow us to have radically different representations including real 3D visualisation - inside rather than just the surface of things. However, interaction adds a qualitatively different aspect. Traditionally we regard sensing as passive, but, in the real world we sense dynamically - we don't just look at the world, but live in it. The power of computer visualisation is most truly grasped when we take advantage of this.