Publication | Closed Access
GA-Net: Guided Aggregation Net for End-To-End Stereo Matching
786
Citations
28
References
2019
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningStereo ImagingCost AggregationDepth MapWhole-image Cost DependenciesImage AnalysisStereo VisionPattern RecognitionComputational GeometryAggregation LayerGeometric ModelingMachine VisionComputer ScienceDeep LearningComputer Vision3D VisionNatural SciencesComputer Stereo VisionAggregation NetScene UnderstandingStereoscopic Processing
Matching cost aggregation is essential for accurate disparity estimation in stereo matching, both in traditional methods and deep neural networks. The authors propose two novel neural network layers designed to capture local and whole‑image cost dependencies. The first layer is a differentiable semi‑global aggregation approximating semi‑global matching, while the second is a local guided aggregation that refines thin structures; together they replace costly 3D convolutions, reducing computational and memory demands. Experiments show that the two‑layer guided aggregation block outperforms GC‑Net, and the full GA‑Net achieves superior accuracy on Scene Flow and KITTI benchmarks.
In the stereo matching task, matching cost aggregation is crucial in both traditional methods and deep neural network models in order to accurately estimate disparities. We propose two novel neural net layers, aimed at capturing local and the whole-image cost dependencies respectively. The first is a semi-global aggregation layer which is a differentiable approximation of the semi-global matching, the second is the local guided aggregation layer which follows a traditional cost filtering strategy to refine thin structures. These two layers can be used to replace the widely used 3D convolutional layer which is computationally costly and memory-consuming as it has cubic computational/memory complexity. In the experiments, we show that nets with a two-layer guided aggregation block easily outperform the state-of-the-art GC-Net which has nineteen 3D convolutional layers. We also train a deep guided aggregation network (GA-Net) which gets better accuracies than state-of-the-art methods on both Scene Flow dataset and KITTI benchmarks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1