Publication | Closed Access
ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation
1.5K
Citations
35
References
2017
Year
Convolutional Neural NetworkScene AnalysisEngineeringMachine LearningAccurate Semantic SegmentationReal-time Semantic SegmentationImage AnalysisData ScienceSemantic SegmentationMachine VisionObject DetectionComputer EngineeringComputer ScienceDeep LearningComputer VisionIntelligent VehiclesScene InterpretationScene UnderstandingScene Modeling
Semantic segmentation is essential for intelligent vehicle perception, yet existing deep neural networks struggle to balance high accuracy with the computational limits of real‑time deployment. This paper introduces a deep architecture that delivers accurate semantic segmentation while operating in real time. The core of the design is a novel residual‑factorized convolutional layer that fuses residual connections with factorized convolutions to preserve efficiency and precision. On Cityscapes, the model runs at over 83 FPS on a Titan X and 7 FPS on a Jetson TX1, matching state‑of‑the‑art accuracy while being orders of magnitude faster than competing high‑precision networks. Code is publicly available at https://github.com/Eromera/erfnet.
Semantic segmentation is a challenging task that addresses most of the perception needs of intelligent vehicles (IVs) in an unified way. Deep neural networks excel at this task, as they can be trained end-to-end to accurately classify multiple object categories in an image at pixel level. However, a good tradeoff between high quality and computational resources is yet not present in the state-of-the-art semantic segmentation approaches, limiting their application in real vehicles. In this paper, we propose a deep architecture that is able to run in real time while providing accurate semantic segmentation. The core of our architecture is a novel layer that uses residual connections and factorized convolutions in order to remain efficient while retaining remarkable accuracy. Our approach is able to run at over 83 FPS in a single Titan X, and 7 FPS in a Jetson TX1 (embedded device). A comprehensive set of experiments on the publicly available Cityscapes data set demonstrates that our system achieves an accuracy that is similar to the state of the art, while being orders of magnitude faster to compute than other architectures that achieve top precision. The resulting tradeoff makes our model an ideal approach for scene understanding in IV applications. The code is publicly available at: https://github.com/Eromera/erfnet.
| Year | Citations | |
|---|---|---|
Page 1
Page 1