When Deep Learning Meets Metric Learning: Remote Sensing Image Scene Classification via Learning Discriminative CNNs

TLDR

Remote sensing image scene classification is a challenging task that has seen significant performance gains from deep learning, yet within‑class diversity and between‑class similarity remain major obstacles. This study introduces a simple yet effective approach to learn discriminative CNNs (D‑CNNs) aimed at improving classification accuracy. The D‑CNNs are trained with a combined loss that adds a metric‑learning regularization term to the standard cross‑entropy, encouraging features of the same class to cluster together while pushing different classes apart, and the method is evaluated on three benchmark datasets using three off‑the‑shelf CNN backbones. Experimental results show that the proposed D‑CNNs outperform existing baselines and achieve state‑of‑the‑art performance on all three datasets.

Abstract

Remote sensing image scene classification is an active and challenging task driven by many applications. More recently, with the advances of deep learning models especially convolutional neural networks (CNNs), the performance of remote sensing image scene classification has been significantly improved due to the powerful feature representations learnt through CNNs. Although great success has been obtained so far, the problems of within-class diversity and between-class similarity are still two big challenges. To address these problems, in this paper, we propose a simple but effective method to learn discriminative CNNs (D-CNNs) to boost the performance of remote sensing image scene classification. Different from the traditional CNN models that minimize only the cross entropy loss, our proposed D-CNN models are trained by optimizing a new discriminative objective function. To this end, apart from minimizing the classification error, we also explicitly impose a metric learning regularization term on the CNN features. The metric learning regularization enforces the D-CNN models to be more discriminative so that, in the new D-CNN feature spaces, the images from the same scene class are mapped closely to each other and the images of different classes are mapped as farther apart as possible. In the experiments, we comprehensively evaluate the proposed method on three publicly available benchmark data sets using three off-the-shelf CNN models. Experimental results demonstrate that our proposed D-CNN methods outperform the existing baseline methods and achieve state-of-the-art results on all three data sets.

References

Page 1

	Year	Citations

Page 1