Fast RF-UIC: A fast unsupervised image captioning model

Abstract

For visually impaired individuals, image captioning is a crucial task that utilizes deep learning models to recognize an image and generate a descriptive sentence, enabling them to understand the content of the image through words. However, the existing image captioning model needs a lot of manual annotation. Fortunately, the emergence of unsupervised methods provides a new approach to image captioning. Our proposed model Fast RF-UIC achieves unsupervised functionality through the designed Pre-trainer. Compared with the existing pre-trained model, the Pre-trainer has a faster and shorter training cycle. The R2-Inception-V4 model is designed as an encoder that fuse the Res2Net structure with Inception-V4 to obtain more image features. Bi-FGRU is designed as the decoder, which the FReLU activation function is used to improve the character representation ability from two-dimensional space. Furthermore, we expanded the corpus used in existing unsupervised image captioning and included additional captions for common objects, effectively enhancing the model’s generalization ability. Through experiments, Fast RF-UIC achieved higher scores than existing unsupervised image captioning methods on several text evaluation metrics such as BLUE, ROUGE, and CIDEr.

References

Page 1

	Year	Citations

Page 1