Semi-Supervised Multitask Learning Using Gaze Focus for Gaze Estimation

Abstract

Gaze estimation can be applied in various scenarios, seeking to comprehend human visual attention through camera images. Contemporary research predominantly employs deep learning to directly output gaze from facial or ocular images. however, most methods concentrate solely on estimating gaze direction, overlooking gaze point. We propose two multitask learning frameworks for estimating gaze point and gaze direction, with the objective of achieving unsupervised learning of gaze point and supervised gaze estimation via gaze intersection. Two attention layers are proposed to guide the generation of facial features, addressing the challenge posed by unlabeled gaze point. The focus attention layer employs the eyes to guide facial features, connecting both features and utilizing similarity to enhance eye information. Another approach utilizes only the full face image, employing self-attention to enhance pertinent information. Four loss functions are employed to constrain networks in 2D and 3D spaces. The combination of eye position constraints and attention layers ensures the accuracy of gaze point prediction. Gaze intersection can be used to obtain gaze depth, thereby solving the problem of depth-overlapping. The advantages of the proposed method in gaze tracking are verified through comprehensive experiments.

References

Page 1

	Year	Citations

Page 1