Unsupervised Deep Feature Extraction for Remote Sensing Image Classification

TLDR

Direct application of supervised convolutional networks to multi‑ and hyper‑spectral imagery is very challenging due to high input dimensionality and limited labeled data. The paper introduces single‑layer and deep convolutional networks for remote sensing analysis and proposes greedy layer‑wise unsupervised pre‑training with an efficient algorithm for learning sparse features. The method employs greedy layer‑wise unsupervised pre‑training of convolutional networks, using a sparse‑representation algorithm that enforces both population and lifetime sparsity. The approach outperforms PCA, kPCA, and state‑of‑the‑art aerial classification algorithms, achieving superior results in aerial scene, VHR land‑use, and multi‑/hyper‑spectral land‑cover classification, with deep architectures delivering higher abstraction and better performance than single‑layer networks.

Abstract

This paper introduces the use of single layer and deep convolutional networks for remote sensing data analysis. Direct application to multi- and hyper-spectral imagery of supervised (shallow or deep) convolutional networks is very challenging given the high input data dimensionality and the relatively small amount of available labeled data. Therefore, we propose the use of greedy layer-wise unsupervised pre-training coupled with a highly efficient algorithm for unsupervised learning of sparse features. The algorithm is rooted on sparse representations and enforces both population and lifetime sparsity of the extracted features, simultaneously. We successfully illustrate the expressive power of the extracted representations in several scenarios: classification of aerial scenes, as well as land-use classification in very high resolution (VHR), or land-cover classification from multi- and hyper-spectral images. The proposed algorithm clearly outperforms standard Principal Component Analysis (PCA) and its kernel counterpart (kPCA), as well as current state-of-the-art algorithms of aerial classification, while being extremely computationally efficient at learning representations of data. Results show that single layer convolutional networks can extract powerful discriminative features only when the receptive field accounts for neighboring pixels, and are preferred when the classification requires high resolution and detailed results. However, deep architectures significantly outperform single layers variants, capturing increasing levels of abstraction and complexity throughout the feature hierarchy.

References

Page 1

	Year	Citations

Page 1