DADNet - Concepedia

Abstract

Most existing CNN-based methods for crowd counting always suffer from large scale variation in objects of interest, leading to density maps of low quality. In this paper, we propose a novel deep model called Dilated-Attention-Deformable ConvNet (DADNet), which consists of two schemes: multi-scale dilated attention and deformable convolutional DME (Density Map Estimation). The proposed model explores a scale-aware attention fusion with various dilation rates to capture different visual granularities of crowd regions of interest, and utilizes deformable convolutions to generate a high-quality density map. There are two merits as follows: (1) varying dilation rates can effectively identify discriminative regions by enlarging the receptive fields of convolutional kernels upon surrounding region cues, and (2) deformable CNN operations promote the accuracy of object localization in the density map by augmenting the spatial object location sampling with adaptive offsets and scalars. DADNet not only excels at capturing rich spatial context of salient and tiny regions of interest simultaneously, but also keeps a robustness to background noises, such as partially occluded objects. Extensive experiments on benchmark datasets verify that DADNet achieves the state-of-the-art performance. Visualization results of the multi-scale attention maps further validate the remarkable interpretability achieved by our solution.

References

Page 1

	Year	Citations

Page 1