We present a real-time method for photorealistic transfer of facial expressions from a source
actor here on the right to a target actor on the left.
The prototype system shown here only needs one consumer-grade PC and a consumer depth
camera for each actor.
The real-time capability of our approach paves the way for a variety of new applications
that were previously impossible.
Imagine multilingual video conferencing where the face video of a participant could be altered
to match the audio of a real-time translator.
One could also impersonate the facial expressions of someone in more fitting business attire
while actually dressed in casual clothing.
Our approach produces photorealistic and highly stable reenacted face videos, even when head
poses and expressions of source and target actors differ strongly.
Expression transfer is also very convincing when the physique of source and target actors
differ or when people wear glasses.
Our method uses a new model-based tracking approach based on a parametric face model.
This model is tracked in real-time from the RGBD input.
Face model calibration for each actor is needed before tracking commences.
Here real-time optimization first finds personalized face parameters.
With this we can begin real-time GPU-based tracking.
The overlay of the rendered model with the video on the right shows our highly stable
and photorealistic results.
For reenactment, source and target actors are tracked, and the source expression parameters
are mapped to the target actor.
Since the mouth changes shape in the target, we synthesize a new mouth interior using a
teeth proxy and a texture of the mouth interior.
Rendered face and mouth interior are composited to produce the target video.
Here we show more real-time facial expression transfer results between different actors.
Due to the estimation of reflectance and lighting in the scene, transfer results are very high
quality even if the source and target lighting differ.
Expression transfer is very stable even when source expressions are more expressed, and
when head pose and shape of source and target actors vary greatly.
By applying parts of the reenactment pipeline only on the source actor, one can also implement
a virtual mirror and render a modified albedo texture to simulate virtual makeup, or render
how the face would look like under a different simulated lighting condition.
We now change the real-world lighting. As we can see, our method is still able to produce
robust results.
We compare our tracking result against ground truth video and depth in previous work.
Here we compare our method against phase shift.
While the geometric alignment of both approaches is similar, our method achieves significantly
better photometric alignment.
A comparison against recent image-based tracking shows that our new RGBD approach reconstructs
models that match shape and expression details of the actor more closely.
Tracking is only slightly disturbed with glasses and quickly recovers.
Here, we analyze the influence of terms in our objective function. Note that the combination
of dense RGB and depth data provides for small geometric and photometric error.
In extreme cases, our tracking may fail, for instance, if a hand completely occludes the
face. However, the tracker quickly recovers once the occluder is gone.
Even many of these extreme poses where the face turns almost completely away from the
camera are still tracked. If occlusions or poses are too extreme, the tracker will fail,
but again recover quickly.
Presenters
Justus Thies
Michael Zollhöfer
Matthias Nießner
Zugänglich über
Offener Zugang
Dauer
00:06:49 Min
Aufnahmedatum
2015-10-12
Hochgeladen am
2015-10-12 08:54:40
Sprache
de-DE
Forscher des Lehrstuhls für Graphische Datenverarbeitung an der FAU haben eine Technik entwickelt, die die Stimme und die Mimik eines Dolmetschers mit dem Gesicht eines Redners in Echtzeit verschmilzt. Videokonferenzen könnten damit sehr viel verständlicher werden (englischsprachiges Video).