Publication | Closed Access
Summarization-based Video Caption via Deep Neural Networks
44
Citations
17
References
2015
Year
Unknown Venue
Natural Language ProcessingVisual Content DrawsDeep Neural NetworksEngineeringMachine LearningVisual GroundingVision Language ModelVideo SummarizationVisual Question AnsweringAppropriate DescriptionsDeep LearningCorpus LinguisticsComputer VisionMachine TranslationMulti-modal Summarization
Generating appropriate descriptions for visual content draws increasing attention recently, where the promising progresses were obtained owing to the breakthroughs in deep neural networks. Different from the traditional SVO (subject, verb, object) based methods, in this paper, we propose a novel framework of video caption via deep neural networks. For each frame, we extract visual features by a fine-tuned deep Convulutional Neural Networks (CNN), which are then fed into a Recurrent Neural Networks (RNN) to generate novel sentences descriptions for each frame. In order to obtain the most representative and high-quality descriptions for target video, a well-devised automatic summarization process is incorporated to reduce the noises by ranking on the sentence-sequence graph. Moreover, our framework owns the merit of describing out-of-sample videos by transferring knowledge from pre-captioned images. Experiments on the benchmark datasets demonstrate our method has better performance than the state-of-the-art methods of video caption in language generation metrics as well as SVO accuracy.
| Year | Citations | |
|---|---|---|
Page 1
Page 1