Multimodal recognition of reading activity in transit using body-worn sensors

TLDR

Reading is a well‑studied visual activity, and vision research has traditionally focused on the perceptual and cognitive processes involved. The study aims to recognize reading activity by jointly analyzing eye and head movements of people in everyday environments. Eye movements were recorded with an electrooculography system and body movements with inertial measurement units; two continuous recognition approaches—string matching of horizontal saccades and a support‑vector‑machine using 90 eye‑movement features—were compared in a study with eight participants performing reading while sitting, standing, walking, and riding a tram, and a segmentation method exploiting eye–head coordination was introduced. Using person‑independent training, the string‑matching and SVM approaches achieved average precisions of 88.9 % (recall 72.3 %) and 87.7 % (recall 87.9 %), respectively, and the proposed segmentation scheme improved recognition performance by more than 24 %.

Abstract

Reading is one of the most well-studied visual activities. Vision research traditionally focuses on understanding the perceptual and cognitive processes involved in reading. In this work we recognize reading activity by jointly analyzing eye and head movements of people in an everyday environment. Eye movements are recorded using an electrooculography (EOG) system; body movements using body-worn inertial measurement units. We compare two approaches for continuous recognition of reading: String matching (STR) that explicitly models the characteristic horizontal saccades during reading, and a support vector machine (SVM) that relies on 90 eye movement features extracted from the eye movement data. We evaluate both methods in a study performed with eight participants reading while sitting at a desk, standing, walking indoors and outdoors, and riding a tram. We introduce a method to segment reading activity by exploiting the sensorimotor coordination of eye and head movements during reading. Using person-independent training, we obtain an average precision for recognizing reading of 88.9% (recall 72.3%) using STR and of 87.7% (recall 87.9%) using SVM over all participants. We show that the proposed segmentation scheme improves the performance of recognizing reading events by more than 24%. Our work demonstrates that the joint analysis of eye and body movements is beneficial for reading recognition and opens up discussion on the wider applicability of a multimodal recognition approach to other visual and physical activities.

References

Page 1

	Year	Citations

Page 1