State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications

TLDR

Computer graphics and vision researchers have long pursued tools for reconstructing, tracking, and analyzing human faces from visual input, and recent advances enable impressive results even from a single RGB or RGB‑D camera, expanding applications from animation to real‑time reenactment. This state‑of‑the‑art review surveys recent trends in monocular facial performance capture and explores their applications, ranging from performance‑based animation to real‑time facial reenactment. The review focuses on optimization‑based methods that recover and track 3‑D face models, covering image‑formation fundamentals, simplifying assumptions, priors that constrain the under‑constrained problem, and optimization techniques for dense photo‑geometric reconstruction from monocular 2‑D data, while illustrating use cases in motion capture, animation, and image/video editing.

Abstract

Abstract The computer graphics and vision communities have dedicated long standing efforts in building computerized tools for reconstructing, tracking, and analyzing human faces based on visual input. Over the past years rapid progress has been made, which led to novel and powerful algorithms that obtain impressive results even in the very challenging case of reconstruction from a single RGB or RGB‐D camera. The range of applications is vast and steadily growing as these technologies are further improving in speed, accuracy, and ease of use. Motivated by this rapid progress, this state‐of‐the‐art report summarizes recent trends in monocular facial performance capture and discusses its applications, which range from performance‐based animation to real‐time facial reenactment. We focus our discussion on methods where the central task is to recover and track a three dimensional model of the human face using optimization‐based reconstruction algorithms. We provide an in‐depth overview of the underlying concepts of real‐world image formation, and we discuss common assumptions and simplifications that make these algorithms practical. In addition, we extensively cover the priors that are used to better constrain the under‐constrained monocular reconstruction problem, and discuss the optimization techniques that are employed to recover dense, photo‐geometric 3D face models from monocular 2D data. Finally, we discuss a variety of use cases for the reviewed algorithms in the context of motion capture, facial animation, as well as image and video editing.

References

Page 1

	Year	Citations

Page 1