Spiking Transformers for Event-based Single Object Tracking

TLDR

Event‑based cameras offer high temporal resolution and dynamic range, capturing asynchronous events rich in temporal and spatial information, yet extracting this information remains an open challenge. This work proposes STNet, a spiking transformer network for single‑object tracking. STNet dynamically extracts and fuses temporal and spatial features using a transformer for global spatial context, an SNN for temporal cues with threshold adaptation based on spatial statistics, and a novel cross‑domain attention fusion algorithm. Experiments on FE240hz, EED, and VisEvent show that STNet surpasses state‑of‑the‑art methods in both tracking accuracy and speed by a significant margin. Code and pretrained models are available at https://github.com/Jee-King/CVPR2022_STNet.

Abstract

Event-based cameras bring a unique capability to tracking, being able to function in challenging real-world conditions as a direct result of their high temporal resolution and high dynamic range. These imagers capture events asynchronously that encode rich temporal and spatial information. However, effectively extracting this information from events remains an open challenge. In this work, we propose a spiking transformer network, STNet, for single object tracking. STNet dynamically extracts and fuses information from both temporal and spatial domains. In particular, the proposed architecture features a transformer module to provide global spatial information and a spiking neural network (SNN) module for extracting temporal cues. The spiking threshold of the SNN module is dynamically adjusted based on the statistical cues of the spatial information, which we find essential in providing robust SNN features. We fuse both feature branches dynamically with a novel cross-domain attention fusion algorithm. Extensive experiments on three event-based datasets, FE240hz, EED and VisEvent validate that the proposed STNet outperforms existing state-of-the-art methods in both tracking accuracy and speed with a significant margin. The code and pretrained models are at https://github.com/Jee-King/CVPR2022_STNet.

References

Page 1

	Year	Citations

Page 1