Both processes came out with results which although not perfect, are better than solutions which I have seen implemented in current computer games. From asking other people their opinion on the animations, the vote was split, with some preferring the blendshapes and others leaning towards the motion capture. Personally, I slightly prefer the motion capture, although the blendshape animation could be improved greatly with more practice, whereas the motion capture is fairly limited in how much better it can get.

Implementing the techniques in games.

The blendshape animation is easier to implement initially, and produces good, expressive results. You can easily produce caricatured expressions with it, and of course, it is the only option if the character is too far removed from human anatomy. The memory used is less than motion capture, as the motion capture approach needs to have a number of co-ordinates stored per frame, whereas the blendshapes have just the initial phonemes to store and the bandwidth for keyframes after that is fairly low. Processor usage is probably higher than motion capture, as it has to interpolate between the blendshapes rather than just reading the values from a file. The overhead should be negligible for most of today's systems though. 

Motion capture could be very useful for games which require a lot of speech. Once the initial setup is complete, the time taken to generate data is fairly low compared with blendshapes. Indeed, in a commercial environment the process can be streamlined further, with the program given a graphical interface to aid with setting keyframes. Also, a higher quality digital camera should yield better results for tracking (the camera I used had quite a lot of chroma noise). It is also conceivable that you could make a head mounted camera, giving the actor the freedom to rotate their head while performing their delivery. As said before, though, the memory requirement for mocap is quite large, although if the game used a streaming process to get the speech from hard disk or CD, this isn't so much of an issue. One disadvantage of motion capture is that the frame rate is fixed, so without interpolating the frames you will have a lower frame rate than the game (on PCs, with consoles they are limited to TV refresh rates). Also, although simple to implement on this low polygon model, where the number of vertices around the mouth is similar to the number of points captured, this technique is harder to use on higher polygon models, and bones or other deformers must be used, or adding more tracking points. 

Both of these techniques would work well on current and next generation games, and should run well, especially with processor speeds getting faster and the burden of intensive 3d processes being transferred to dedicated 3d cards and chips. I think blendshapes are better for games which have cartoony or non-human characters, or the amount of speech is small or repeated lots, as in a first person shooter. Motion capture is good for long speeches, and is ideally suited to something like an adventure game or RPG.

Continue to Conclusion.
Home.