NVGaze - Concepedia

TLDR

Quality, diversity, and size of training data are critical for learning‑based gaze estimators. We built a synthetic infrared dataset of 2 million images and a real‑world dataset of 2.5 million images from 35 subjects, then trained neural networks that run in sub‑millisecond latency. The resulting gaze network attains a mean error of 2.06° over a 30° × 40° field on unseen subjects and 0.5° best‑case when fine‑tuned to one subject, while the pupil‑localization network shows higher robustness than previous methods.

Abstract

Quality, diversity, and size of training data are critical factors for learning-based gaze estimators. We create two datasets satisfying these criteria for near-eye gaze estimation under infrared illumination: a synthetic dataset using anatomically-informed eye and face models with variations in face shape, gaze direction, pupil and iris, skin tone, and external conditions (2M images at 1280x960), and a real-world dataset collected with 35 subjects (2.5M images at 640x480). Using these datasets we train neural networks performing with sub-millisecond latency. Our gaze estimation network achieves 2.06(±0.44)° of accuracy across a wide 30°×40° field of view on real subjects excluded from training and 0.5° best-case accuracy (across the same FOV) when explicitly trained for one real subject. We also train a pupil localization network which achieves higher robustness than previous methods.

References

Page 1

	Year	Citations

Page 1