Publication | Closed Access
Relational Attention Network for Crowd Counting
201
Citations
38
References
2019
Year
Unknown Venue
Artificial IntelligenceCrowd SimulationScene AnalysisEngineeringMachine LearningRelational Attention NetworkImage AnalysisData SciencePattern RecognitionVideo TransformerDensity EstimationMachine VisionCrowd BehaviorObject DetectionCrowd CountingVision Language ModelComputer ScienceDeep LearningComputer VisionCrowd ComputingScene Understanding
Crowd counting is receiving rapidly growing research interests due to its potential application value in numerous real-world scenarios. However, due to various challenges such as occlusion, insufficient resolution and dynamic backgrounds, crowd counting remains an unsolved problem in computer vision. Density estimation is a popular strategy for crowd counting, where conventional density estimation methods perform pixel-wise regression without explicitly accounting the interdependence of pixels. As a result, independent pixel-wise predictions can be noisy and inconsistent. In order to address such an issue, we propose a Relational Attention Network (RANet) with a self-attention mechanism for capturing interdependence of pixels. The RANet enhances the self-attention mechanism by accounting both short-range and long-range interdependence of pixels, where we respectively denote these implementations as local self-attention (LSA) and global self-attention (GSA). We further introduce a relation module to fuse LSA and GSA to achieve more informative aggregated feature representations. We conduct extensive experiments on four public datasets, including ShanghaiTech A, ShanghaiTech B, UCF-CC-50 and UCF-QNRF. Experimental results on all datasets suggest RANet consistently reduces estimation errors and surpasses the state-of-the-art approaches by large margins.
| Year | Citations | |
|---|---|---|
Page 1
Page 1