Publication | Closed Access
Learning Contextual Transformer Network for Image Inpainting
29
Citations
43
References
2021
Year
Unknown Venue
Long Range DependenciesConvolutional Neural NetworkContextual Transformer NetworkMachine VisionImage AnalysisMachine LearningEngineeringPattern RecognitionAttention ModulesVision Language ModelInpaintingHuman Image SynthesisImage HallucinationDeep LearningVideo TransformerConvolutional NetworkComputer VisionSynthetic Image Generation
Fully Convolutional Networks with attention modules have been proven effective for learning-based image inpainting. While many existing approaches could produce visually reasonable results, the generated images often show blurry textures or distorted structures around corrupted areas. The main reason is due to the fact that convolutional neural networks have limited capacity for modeling contextual information with long range dependencies. Although the attention mechanism can alleviate this problem to some extent, existing attention modules tend to emphasize similarities between the corrupted and the uncorrupted regions while ignoring the dependencies from within each of them. Hence, this paper proposes the Contextual Transformer Network (CTN) which not only learns relationships between the corrupted and the uncorrupted regions but also exploits their respective internal closeness. Besides, instead of a fully convolutional network, in our CTN, we stack several transformer blocks to replace convolution layers to better model the long range dependencies. Finally, by dividing the image into patches of different sizes, we propose a multi-scale multi-head attention module to better model the affinity among various image regions. Experiments on several benchmark datasets demonstrate superior performance by our proposed approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1