Publication | Closed Access
Semantic-Aware Video Text Detection
30
Citations
50
References
2021
Year
Unknown Venue
Natural Language ProcessingAppearance FeaturesIcdar 2013Image AnalysisInformation RetrievalMachine VisionVideo TextPattern RecognitionEngineeringText RecognitionArtsVision Language ModelVideo SummarizationVideo Content AnalysisVideo UnderstandingDeep LearningVideo RetrievalComputer Vision
Most existing video text detection methods track texts with appearance features, which are easily influenced by the change of perspective and illumination. Compared with appearance features, semantic features are more robust cues for matching text instances. In this paper, we propose an end-to-end trainable video text detector that tracks texts based on semantic features. First, we introduce a new character center segmentation branch to extract semantic features, which encode the category and position of characters. Then we propose a novel appearance-semantic-geometry descriptor to track text instances, in which se-mantic features can improve the robustness against appearance changes. To overcome the lack of character-level an-notations, we propose a novel weakly-supervised character center detection module, which only uses word-level annotated real images to generate character-level labels. The proposed method achieves state-of-the-art performance on three video text benchmarks ICDAR 2013 Video, Minetto and RT-1K, and two Chinese scene text benchmarks CA-SIA10K and MSRA-TD500.
| Year | Citations | |
|---|---|---|
Page 1
Page 1