Publication | Closed Access
Visual attention based on long-short term memory model for image caption generation
22
Citations
16
References
2017
Year
Unknown Venue
Natural Language ProcessingArtificial IntelligenceMultimodal LlmConvolutional Neural NetworkImage AnalysisMachine LearningMachine VisionVisual AttentionEngineeringText-to-image RetrievalVisual GroundingImage Caption GenerationVision Language ModelVisual Question AnsweringDeep LearningComputer VisionMachine Translation
Image caption generation becomes a raising topic in computer vision and artificial intelligence. In order to solve the problem of stiff description, we intend to extract richer features using convolutional neural network (CNN). A neural and probabilistic framework has been proposed consequently which combines CNN with a special form of recurrent neural network (RNN) to produce an end-to-end image captioning. We use a model that takes advantage of word to vector to encode the variable length input into a fixed dimensional vector. Considering the description of the object in an image is not specific enough, we introduce an attention mechanism through visualization to show how the model is able to fix its gaze on salient objects. We validate our model on three benchmark datasets and get great performance by using standard evaluation metrics.
| Year | Citations | |
|---|---|---|
Page 1
Page 1