Visual Tracking with Fully Convolutional Networks

TLDR

The paper proposes a new general object tracking approach using fully convolutional neural networks. The method analyzes pre‑trained CNN features, jointly uses top and lower layers with a switch, and selects relevant feature maps to reduce redundancy and improve tracking accuracy. The study finds that different CNN layers capture distinct target aspects, that only a subset of neurons is relevant, and that the resulting tracker significantly outperforms state‑of‑the‑art benchmarks.

Abstract

We propose a new approach for general object tracking with fully convolutional neural network. Instead of treating convolutional neural network (CNN) as a black-box feature extractor, we conduct in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet. The discoveries motivate the design of our tracking system. It is found that convolutional layers in different levels characterize the target from different perspectives. A top layer encodes more semantic features and serves as a category detector, while a lower layer carries more discriminative information and can better separate the target from distracters with similar appearance. Both layers are jointly used with a switch mechanism during tracking. It is also found that for a tracking target, only a subset of neurons are relevant. A feature map selection method is developed to remove noisy and irrelevant feature maps, which can reduce computation redundancy and improve tracking accuracy. Extensive evaluation on the widely used tracking benchmark [36] shows that the proposed tacker outperforms the state-of-the-art significantly.

References

Page 1

	Year	Citations

Page 1