ISNet: Shape Matters for Infrared Small Target Detection

TLDR

Infrared small target detection extracts tiny, dim targets from noisy, cluttered backgrounds, yet low signal‑to‑noise ratios and low contrast make precise shape identification difficult. This work introduces ISNet, a shape‑aware network employing a Taylor finite difference–inspired edge block and a two‑orientation attention aggregation block to better capture target shapes. The edge block aggregates multi‑level edge cues to enhance contrast, while the attention block fuses low‑level row/column attention with high‑level features to isolate shape characteristics and suppress noise, and the authors also present the IRSTD‑1k benchmark of 1,000 annotated images. On public datasets and IRSTD‑1k, ISNet consistently outperforms representative state‑of‑the‑art infrared small target detectors. The IRSTD‑1k dataset and ISNet code are released at github.com/RuiZhang97/ISNet.

Abstract

Infrared small target detection (IRSTD) refers to extracting small and dim targets from blurred backgrounds, which has a wide range of applications such as traffic management and marine rescue. Due to the low signal-to-noise ratio and low contrast, infrared targets are easily submerged in the background of heavy noise and clutter. How to detect the precise shape information of infrared targets remains challenging. In this paper, we propose a novel infrared shape network (ISNet), where Taylor finite difference (TFD) -inspired edge block and two-orientation attention aggregation (TOAA) block are devised to address this problem. Specifically, TFD-inspired edge block aggregates and enhances the comprehensive edge information from different levels, in order to improve the contrast between target and background and also lay a foundation for extracting shape information with mathematical interpretation. TOAA block calculates the lowlevel information with attention mechanism in both row and column directions and fuses it with the high-level information to capture the shape characteristic of targets and suppress noises. In addition, we construct a new benchmark consisting of 1, 000 realistic images in various target shapes, different target sizes, and rich clutter backgrounds with accurate pixel-level annotations, called IRSTD-1k. Experiments on public datasets and IRSTD-1 k demonstrate the superiority of our approach over representative state-of-the-art IRSTD methods. The dataset and code are available at github.com/RuiZhang97/ISNet.

References

Page 1

	Year	Citations

Page 1