Publication | Closed Access
Faster Video Moment Retrieval with Point-Level Supervision
19
Citations
15
References
2023
Year
Unknown Venue
EngineeringMachine LearningVideo SummarizationVideo RetrievalVideo Moment RetrievalVideo InterpretationNatural Language ProcessingMultimodal LlmImage AnalysisInformation RetrievalData ScienceFaster Moment RetrievalPattern RecognitionMachine VisionAnnotation CostComputer ScienceVideo UnderstandingDeep LearningComputer VisionPoint-level SupervisionArts
Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an untrimmed video with natural language queries. Existing VMR methods suffer from two defects: (1) massive expensive temporal annotations are required to obtain satisfying performance; (2) complicated cross-modal interaction modules are deployed, which lead to high computational cost and low efficiency for the retrieval process. To address these issues, we propose a novel method termed Cheaper and Faster Moment Retrieval (CFMR), which balances the retrieval accuracy, efficiency, and annotation cost for VMR. Specifically, our proposed CFMR method learns from point-level supervision where each annotation is a single frame randomly located within the target moment. Such a labeling strategy achieves 6 times cheaper than the conventional annotations of event boundaries. Furthermore, we also design a concept-based multimodal alignment mechanism to bypass the usage of cross-modal interaction modules during the inference process, remarkably improving retrieval efficiency. The experimental results on three widely used VMR benchmarks demonstrate our proposed CFMR method achieves superior comprehensive performance to current state-of-the-art methods. Moreover, it significantly accelerates the retrieval speed with more than 100 times FLOPs compared to existing approaches with point-level supervision. Our open-source implementation is available at https://github.com/CFM-MSG/Code_CFMR.
| Year | Citations | |
|---|---|---|
Page 1
Page 1