Carnegie Mellon Researchers Create Accurate Facial 3D Models From Standard Smartphone Video

Processed in around half an hour from a few seconds of slow-motion camera footage, the 3D models offer sub-millimetre accuracy.

The models produced from simple smartphone footage are high-accuracy. (πŸ“·: Agrawal et al)

Researchers at Carnegie Mellon University have developed a system for creating a high-accuracy 3D model of a subject's facial features, using nothing more than footage captured on a standard smartphone β€” and an artificial intelligence algorithm which can also run on the same device.

"Building a 3D reconstruction of the face has been an open problem in computer vision and graphics because people are very sensitive to the look of facial features," says Simon Lucey, an associate research professor in the Robotics Institute department of Carnegie Mellon University. "Even slight anomalies in the reconstructions can make the end result look unrealistic."

Typically, the solution is very hardware-driven: Studio setups with specific lighting, multiple cameras, and even laser-based scanning to pull in as much detail as possible. The approach taken by Lucey and team, however, is different: Using artificial intelligence to work out where in space a subject's features sit with sub-millimeter accuracy, using a standard smartphone camera.

The system works by capturing around 15-20 seconds of footage, covering the front and side of the subject's head, using the smartphone's slow-motion high-framerate mode. "The high frame rate of slow motion is one of the key things for our method," Lucey explains, "because it generates a dense point cloud."

This footage is then processed through a visual simultaneous localisation and mapping (SLAM) system to create an initial 3D model, but one with gaps where data are missing. These gaps are then filled in in a two-stage process: A deep-learning algorithm to identify gross features including the eyes, ears, and nose, and then more traditional computer vision techniques for finer detail.

"Deep learning is a powerful tool that we use every day," says Lucey. "But deep learning has a tendency to memorise solutions" β€” something which actively hampers its ability to pull in specific and unique features of an individual's face. "If you use these algorithms just to find the landmarks, you can use classical methods to fill in the gaps much more easily."

The process takes around 30-40 minutes, and can be performed on the smartphone itself. While the team's work focused purely on facial modelling, the same technique could be extended for 3D scanning of almost any object - and even to duplicate objects through 3D printing.

The team's work has been published under open access terms on arXiv.org.

Gareth Halfacree
Freelance journalist, technical author, hacker, tinkerer, erstwhile sysadmin. For hire: freelance@halfacree.co.uk.
Related articles
Sponsored articles
Related articles