3D Hand Shape and Pose Estimation From a Single RGB Image

TLDR

Most 3D hand analysis from monocular RGB images only estimates keypoint locations, which fails to capture full hand shape. This work aims to estimate the full 3D hand shape and pose from a single RGB image by reconstructing a complete hand mesh. We employ a Graph Convolutional Neural Network trained on a large synthetic dataset of 3D meshes and poses, and fine‑tune it on real images using weak supervision from depth maps. Evaluations on new and public datasets show the method produces accurate hand meshes and outperforms state‑of‑the‑art pose estimation.

Abstract

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image. Most current methods in 3D hand analysis from monocular RGB images only focus on estimating the 3D locations of hand keypoints, which cannot fully express the 3D shape of hand. In contrast, we propose a Graph Convolutional Neural Network (Graph CNN) based method to reconstruct a full 3D mesh of hand surface that contains richer information of both 3D hand shape and pose. To train networks with full supervision, we create a large-scale synthetic dataset containing both ground truth 3D meshes and 3D poses. When fine-tuning the networks on real-world datasets without 3D ground truth, we propose a weakly-supervised approach by leveraging the depth map as a weak supervision in training. Through extensive evaluations on our proposed new datasets and two public datasets, we show that our proposed method can produce accurate and reasonable 3D hand mesh, and can achieve superior 3D hand pose estimation accuracy when compared with state-of-the-art methods.

References

Page 1

	Year	Citations

Page 1