Publication | Closed Access
DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
1.7K
Citations
21
References
2015
Year
Unknown Venue
Geometric LearningSource CodeMachine VisionMachine LearningEngineeringScene InterpretationBehavior ReflexScene UnderstandingMediated PerceptionComputer ScienceVideo UnderstandingRobot LearningAutonomous DrivingDeep LearningScene ModelingComputer VisionDirect Perception
Vision‑based autonomous driving systems are traditionally divided into mediated perception, which parses the entire scene, and behavior reflex, which directly maps images to actions. This work introduces a direct‑perception paradigm that estimates driving affordances from images. The authors train a deep convolutional neural network to map images to a compact set of affordance indicators, using 12 hours of human‑driven gameplay and KITTI distance data. The approach generalizes to real driving images and offers an appropriate abstraction between mediated perception and behavior reflex. Source code and data are available on the project website.
Today, there are two major paradigms for vision-based autonomous driving systems: mediated perception approaches that parse an entire scene to make a driving decision, and behavior reflex approaches that directly map an input image to a driving action by a regressor. In this paper, we propose a third paradigm: a direct perception approach to estimate the affordance for driving. We propose to map an input image to a small number of key perception indicators that directly relate to the affordance of a road/traffic state for driving. Our representation provides a set of compact yet complete descriptions of the scene to enable a simple controller to drive autonomously. Falling in between the two extremes of mediated perception and behavior reflex, we argue that our direct perception representation provides the right level of abstraction. To demonstrate this, we train a deep Convolutional Neural Network using recording from 12 hours of human driving in a video game and show that our model can work well to drive a car in a very diverse set of virtual environments. We also train a model for car distance estimation on the KITTI dataset. Results show that our direct perception approach can generalize well to real driving images. Source code and data are available on our project website.
| Year | Citations | |
|---|---|---|
Page 1
Page 1