Publication | Closed Access
Crossover Learning for Fast Online Video Instance Segmentation
83
Citations
45
References
2021
Year
Multiple Instance LearningEngineeringMachine LearningVideo SummarizationVideo RetrievalVideo InterpretationNatural Language ProcessingImage AnalysisData SciencePattern RecognitionVideo Content AnalysisVideo TransformerMachine VisionComputer ScienceVideo UnderstandingDeep LearningComputer VisionTemporal Information ModelingTemporal Visual ContextVideo Instance SegmentationCrossover Learning
Modeling temporal visual context across frames is critical for video instance segmentation (VIS) and other video understanding tasks. In this paper, we propose a fast on-line VIS model termed CrossVIS. For temporal information modeling in VIS, we present a novel crossover learning scheme that uses the instance feature in the current frame to pixel-wisely localize the same instance in other frames. Different from previous schemes, crossover learning does not require any additional network parameters for feature enhancement. By integrating with the instance segmentation loss, crossover learning enables efficient cross-frame instance-to-pixel relation learning and brings cost-free improvement during inference. Besides, a global balanced instance embedding branch is proposed for better and more stable online instance association. We conduct extensive experiments on three challenging VIS benchmarks, i.e., YouTube-VIS-2019, OVIS, and YouTube-VIS-2021 to evaluate our methods. CrossVIS achieves state-of-the-art online VIS performance and shows a decent trade-off between latency and accuracy. Code is available at https://github.com/hustvl/CrossVIS.
| Year | Citations | |
|---|---|---|
Page 1
Page 1