Real-time human pose recognition in parts from single depth images

TLDR

The method builds on existing object recognition techniques. The study proposes a fast, accurate single‑image depth‑based 3D human pose estimation method. They transform pose estimation into per‑pixel body‑part classification, train a classifier on a large synthetic depth dataset, and generate confidence‑scored 3D joint proposals by reprojecting classification results and locating local modes. The system achieves real‑time performance (<5 ms on Xbox 360) and state‑of‑the‑art accuracy on synthetic and real data, outperforming nearest‑neighbor matching.

Abstract

We propose a new method to quickly and accurately predict human pose ---the 3D positions of body joints---from a single depth image, without depending on information from preceding frames. Our approach is strongly rooted in current object recognition strategies. By designing an intermediate representation in terms of body parts, the difficult pose estimation problem is transformed into a simpler per-pixel classification problem, for which efficient machine learning techniques exist. By using computer graphics to synthesize a very large dataset of training image pairs, one can train a classifier that estimates body part labels from test images invariant to pose, body shape, clothing, and other irrelevances. Finally, we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs in under 5ms on the Xbox 360. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state-of-the-art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching.

References

Page 1

	Year	Citations

Page 1