Machine Learning and End-to-End Deep Learning for Monitoring Driver Distractions From Physiological and Visual Signals

TLDR

Autonomous vehicles will soon be ubiquitous, yet human supervision of driving will remain necessary for decades. The study aims to determine which physiological and visual sensor modalities best detect four types of driver distractions by comparing wearable and embedded sensors. Using data from a prior driving simulation, the authors compared physiological signals (pEDA, heart rate, breathing) and visual signals (eye.

Abstract

It is only a matter of time until autonomous vehicles become ubiquitous; however, human driving supervision will remain a necessity for decades. To assess the driver's ability to take control over the vehicle in critical scenarios, driver distractions can be monitored using wearable sensors or sensors that are embedded in the vehicle, such as video cameras. The types of driving distractions that can be sensed with various sensors is an open research question that this study attempts to answer. This study compared data from physiological sensors (palm electrodermal activity (pEDA), heart rate and breathing rate) and visual sensors (eye tracking, pupil diameter, nasal EDA (nEDA), emotional activation and facial action units (AUs)) for the detection of four types of distractions. The dataset was collected in a previous driving simulation study. The statistical tests showed that the most informative feature/modality for detecting driver distraction depends on the type of distraction, with emotional activation and AUs being the most promising. The experimental comparison of seven classical machine learning (ML) and seven end-to-end deep learning (DL) methods, which were evaluated on a separate test set of 10 subjects, showed that when classifying windows into distracted or not distracted, the highest F1-score of 79% was realized by the extreme gradient boosting (XGB) classifier using 60-second windows of AUs as input. When classifying complete driving sessions, XGB's F1-score was 94%. The best-performing DL model was a spectro-temporal ResNet, which realized an F1-score of 75% when classifying segments and an F1-score of 87% when classifying complete driving sessions. Finally, this study identified and discussed problems, such as label jitter, scenario overfitting and unsatisfactory generalization performance, that may adversely affect related ML approaches.

References

Page 1

	Year	Citations

Page 1