Crossmodal Transformer Based Generative Framework for Pedestrian Trajectory Prediction

Abstract

Providing guidance about collision avoidance, pedestrian trajectory prediction is an important task for autonomous driving. In this paper, to produce plausible trajectory predictions in the first-person view circumstance, we propose a crossmodal transformer based generative framework which could leverage sequences of cues from multiple modalities as well as pedestrian attributes. For the encoder, crossmodal transformers are exploited during the past stage to explore the cross-relation features of four modality-modality pairs, which are then fused with the help of a branch assigning operation and a modality attention module. For the decoder, we employ a bézier curve interpolation based method to project encoder features into trajectory results. Our training process not only considers the pedestrian's intention of crossing road but also optimizes our model to achieve more accurate predictions at the terminal time steps. Experimental results demonstrate that our framework outperforms state-of-the-art methods on both JAAD and PIE datasets. Especially, compared with the best baseline, our method could achieve 15.1%/14.3% and 14.3%/22.2% improvement for deterministic/multimodal prediction in the metric of box center final displacement error on JAAD and PIE, respectively.

References

Page 1

	Year	Citations

Page 1