Since the early eighties camera tracking (ego-motion) was employed in Robotics to help a machine locate its position in a room, in relation to static or even moving objects around it. For these systems the feedback time needs to be almost instant. At the moment tracking can be done at speeds as fast as 10Hz [i] and even 30Hz [ii].
Around ten years later it found its way in to the film industry. A very early example of this work can be found in the film "Jurrasic Park" [iii]. Up until this time the director was restricted to simple static camera systems, or to heavy weight motion control systems (such as the one developed by Industrial Light and Magic in the late seventies for the film "Star Wars"). In "Jurrasic Park" the director was allowed to film hand held shots, using small fluorescent markers on the floor, to aid the computer and its user in replicating the camera motion within the computer's 3D space.
It was from this point onwards that camera tracking was used to seemlessly augment reality with two or three dimensional elements. It could be used to extended traditional effects. Background mattes, for example, can be dynamically added to a scene without having to hand paint changes on every frame [iv]
From film, this technique has found its way on to the small screen. With modern day technology, real-time tracking can be used for adding slogans to billboards at live sports events, and it can be used for placing live action performers on to virtual sets [v][vi].
The essential building block for such systems is accurate feature tracking. Some methods involve pattern recognition algorithms based on combinations of colours within a selected area. Others use the detection of motion blur to get a sense of speed and direction [vii]. Many systems depend on edge detection, where the changes in contrast/intensity are used as guides to the shape and position of features[viii][ix].
The future of ego-motion seems to lie in the interactive/games market. Systems have already been developed which can interpret object and even human motions from a single image source [x]. This paper discusses some simpler systems that can be applied to the recovery of camera motions/position and concentrates on a system which differs from the rest as it deals with single frames at a time, rather than an image sequence.