Concepedia

Publication | Closed Access

High Performance Visual Tracking with Siamese Region Proposal Network

2.9K

Citations

33

References

2018

Year

TLDR

Visual object tracking has seen deep learning trackers reach state‑of‑the‑art accuracy on many benchmarks, yet most struggle to maintain real‑time speed. The authors propose the Siamese‑RPN, an end‑to‑end trained network using large‑scale image pairs. The Siamese‑RPN pairs a Siamese feature extractor with a region‑proposal head that classifies and regresses bounding boxes, and during inference it pre‑computes the template branch and treats correlation as simple convolutions to enable fast online tracking. The method eliminates the need for multi‑scale testing and online fine‑tuning, achieving 160 FPS while outperforming state‑of‑the‑art trackers on VOT2015, VOT2016, and VOT2017 real‑time benchmarks.

Abstract

Visual object tracking has been a fundamental topic in recent years and many deep learning based trackers have achieved state-of-the-art performance on multiple benchmarks. However, most of these trackers can hardly get top performance with real-time speed. In this paper, we propose the Siamese region proposal network (Siamese-RPN) which is end-to-end trained off-line with large-scale image pairs. Specifically, it consists of Siamese subnetwork for feature extraction and region proposal subnetwork including the classification branch and regression branch. In the inference phase, the proposed framework is formulated as a local one-shot detection task. We can pre-compute the template branch of the Siamese subnetwork and formulate the correlation layers as trivial convolution layers to perform online tracking. Benefit from the proposal refinement, traditional multi-scale test and online fine-tuning can be discarded. The Siamese-RPN runs at 160 FPS while achieving leading performance in VOT2015, VOT2016 and VOT2017 real-time challenges.

References

YearCitations

Page 1