Possible improvements:
Blendshapes:
To create fully accurate speech, approximately 13 phonemes are needed,
but with the low polygon nature of the model I was using, 8 is sufficient.
One thing which would improve the realism is the addition of a tongue.
This is one area where blendshapes would definitely improve on motion capture,
as it's very difficult to capture the movement of the tongue. Indeed, it's
quite difficult to see what the tongue is doing when we speak, but if it
isn't there, it is very noticeable. I didn't do very much in the way of
facial expression other than lip sync in the project, as that was the main
component, but a lot of emotion can be added by expressing emotion with
the rest of the face.
Motion capture:
Again, a tongue would be desirable, especially with the opening words
"Anything I can help you with?". The mouth doesn't actually move that much,
most of the movement is from the tongue. As mentioned above, though, the
tongue isn't really a suitable candidate for motion capture. Possibly a
mix of blendshapes and motion capture could be used. I also didn't address
the z axis in my animation. The realism could be improved by making the
lips come forward as the mouth becomes more pinched, so an "ooo" sound
would look a lot better. It doesn't really show in the example here though,
as the character is seen from the front, so you wouldn't notice any movement
in the z-axis. Better lighting would probably help improve the quality
of the tracking, as well as using a material for the trackers that is less
reflective. A digital camcorder would be desirable for tracking, but is
not necessary. It would be nice to add a graphical interface, as you could
simply select a point, and watch it track, and if it goes off course simply
add a keyframe by clicking – this would increase the throughput of this
system immeasurably. More points could also be added – some commercial
facial tracking systems use hundreds of points. This would allow the tracking
of facial expressions as well as lip sync. One reason this could become
important is having big name actors in games. As the industry becomes more
and more mass market and valuable, people will expect to see the quality
of acting improve. It would be desirable with recognisable stars to have
their acting and facial expressions show up in their characters as well
as their likeness.
Other possible techniques:
Another technique which might be useful is a program that takes in text,
and converts it into a series of blendshapes. You would still have to position
the resulting keyframes by hand, but it would streamline the blendshape
process.
Muscle based facial animation is a possibility, and has numerous advantages.
It's realistic, causes secondary deformation, and can create a wide variety
of facial expressions. The drawbacks are that it takes a long time to set
up muscles for a character, and layers of abstraction (like setting up
controls for "smile", "frown" etc) must be added or you have an extremely
unintuitive interface. It's overkill for games for a good while yet though.
As for the next generation, for future massively multiplayer games,
we may well need to create solutions for players speaking to each other.
As we cannot control what they say, this may involve using speech recognition
software, but instead of converting to text, converting the output to phonemes
which can be synched with the speech.
Summary:
I felt that this project was a good experiment into what might be possible
with today's game technology. I fulfilled the project criteria by trying
to innovate in this area, and I hope that I might get a chance to try out
some of these ideas in the future. I also used MEL for the first time,
which should be an invaluable skill in the future, and brushed up on my
C programming skills.
Continue to Bibliography.
Home. |