Adaptive Short-Temporal Induced Aware Fusion Network for Predicting Attention Regions Like a Driver

Abstract

Driver attention prediction can solve the problem of ‘Where should the driver pay attention?’, Most previous methods are designed to predict regional attention with redundant regions. Furthermore, popular spatial-temporal feature extraction networks such as ConvLSTM and 3D-CNN are difficult to achieve real-time. To overcome these difficulties, we propose an Adaptive Short-temporal Induced Aware Fusion Network (ASIAF-Net) for region-level and object-level driver attention prediction. <xref ref-type="fn" rid="fn1" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</xref> In ASIAF-Net, we design an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Attention Related Spatial Feature Encoder (AF-Encoder) and an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Induced Aware Fusion Network (IAF-Net) as the main network; with an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Association Analysis Cell (AAC), the AF-Encoder makes it possible to effectively capture the relationship information of different objects. Considering most vital visual cues from moving objects, we propose a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Self-adaptive Short-temporal Feature Extraction Module (SSFE-Module) to obtain inter-frame motion features. In IAF-Net, a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Multi-scale Driver Attention Region Prediction Branch is designed to predict the regional attention, and an <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Object Saliency Estimation Branch is proposed to fuse the perception results and the regional attention map to estimate the object-level attention. Experiments show that the proposed ASIAF-Net can predict driver’s attention on regions and objects more robustly and precisely than state-of-the-art methods on three datasets, and that it achieves real-time on our ADAS platform.

References

Page 1

	Year	Citations

Page 1