<i>StfNet</i>: A Two-Stream Convolutional Neural Network for Spatiotemporal Image Fusion

TLDR

Spatiotemporal image fusion aims to deliver high‑resolution Earth observations with frequent coverage, yet learning‑based methods that treat it as a single‑image super‑resolution task often lose spatial detail due to large upscaling factors. This work proposes StfNet, a two‑stream convolutional neural network that leverages temporal information from fine image sequences to perform spatiotemporal fusion. StfNet first incorporates a neighboring fine image to super‑resolve the coarse image at the prediction date, then applies a temporal constraint across the time‑series to enforce uniqueness and temporal consistency in the predictions. Experiments on Landsat‑MODIS datasets show that StfNet achieves state‑of‑the‑art visual and quantitative performance.

Abstract

Spatiotemporal image fusion is considered as a promising way to provide Earth observations with both high spatial resolution and frequent coverage, and recently, learning-based solutions have been receiving broad attention. However, these algorithms treating spatiotemporal fusion as a single image super-resolution problem, generally suffers from the significant spatial information loss in coarse images, due to the large upscaling factors in real applications. To address this issue, in this paper, we exploit temporal information in fine image sequences and solve the spatiotemporal fusion problem with a two-stream convolutional neural network called StfNet. The novelty of this paper is twofold. First, considering the temporal dependence among image sequences, we incorporate the fine image acquired at the neighboring date to super-resolve the coarse image at the prediction date. In this way, our network predicts a fine image not only from the structural similarity between coarse and fine image pairs but also by exploiting abundant texture information in the available neighboring fine images. Second, instead of estimating each output fine image independently, we consider the temporal relations among time-series images and formulate a temporal constraint. This temporal constraint aiming to guarantee the uniqueness of the fusion result and encourages temporal consistent predictions in learning and thus leads to more realistic final results. We evaluate the performance of the StfNet using two actual data sets of Landsat-Moderate Resolution Imaging Spectroradiometer (MODIS) acquisitions, and both visual and quantitative evaluations demonstrate that our algorithm achieves state-of-the-art performance.

References

Page 1

	Year	Citations

Page 1