Possible improvements:

Blendshapes:

To create fully accurate speech, approximately 13 phonemes are needed, but with the low polygon nature of the model I was using, 8 is sufficient. One thing which would improve the realism is the addition of a tongue. This is one area where blendshapes would definitely improve on motion capture, as it's very difficult to capture the movement of the tongue. Indeed, it's quite difficult to see what the tongue is doing when we speak, but if it isn't there, it is very noticeable. I didn't do very much in the way of facial expression other than lip sync in the project, as that was the main component, but a lot of emotion can be added by expressing emotion with the rest of the face.

Motion capture:

Again, a tongue would be desirable, especially with the opening words "Anything I can help you with?". The mouth doesn't actually move that much, most of the movement is from the tongue. As mentioned above, though, the tongue isn't really a suitable candidate for motion capture. Possibly a mix of blendshapes and motion capture could be used. I also didn't address the z axis in my animation. The realism could be improved by making the lips come forward as the mouth becomes more pinched, so an "ooo" sound would look a lot better. It doesn't really show in the example here though, as the character is seen from the front, so you wouldn't notice any movement in the z-axis. Better lighting would probably help improve the quality of the tracking, as well as using a material for the trackers that is less reflective. A digital camcorder would be desirable for tracking, but is not necessary. It would be nice to add a graphical interface, as you could simply select a point, and watch it track, and if it goes off course simply add a keyframe by clicking – this would increase the throughput of this system immeasurably. More points could also be added – some commercial facial tracking systems use hundreds of points. This would allow the tracking of facial expressions as well as lip sync. One reason this could become important is having big name actors in games. As the industry becomes more and more mass market and valuable, people will expect to see the quality of acting improve. It would be desirable with recognisable stars to have their acting and facial expressions show up in their characters as well as their likeness.

Other possible techniques:

Another technique which might be useful is a program that takes in text, and converts it into a series of blendshapes. You would still have to position the resulting keyframes by hand, but it would streamline the blendshape process.

Muscle based facial animation is a possibility, and has numerous advantages. It's realistic, causes secondary deformation, and can create a wide variety of facial expressions. The drawbacks are that it takes a long time to set up muscles for a character, and layers of abstraction (like setting up controls for "smile", "frown" etc) must be added or you have an extremely unintuitive interface. It's overkill for games for a good while yet though.

As for the next generation, for future massively multiplayer games, we may well need to create solutions for players speaking to each other. As we cannot control what they say, this may involve using speech recognition software, but instead of converting to text, converting the output to phonemes which can be synched with the speech.

Summary:

I felt that this project was a good experiment into what might be possible with today's game technology. I fulfilled the project criteria by trying to innovate in this area, and I hope that I might get a chance to try out some of these ideas in the future. I also used MEL for the first time, which should be an invaluable skill in the future, and brushed up on my C programming skills.

Continue to Bibliography.
Home.