Publication | Closed Access
Temporal Group Fusion Network for Deep Video Inpainting
18
Citations
33
References
2021
Year
Deep Video InpaintingMachine VisionImage AnalysisMachine LearningEngineeringPattern RecognitionInpaintingVideo SummarizationVideo InpaintingVideo HallucinationVideo TransformerVideo UnderstandingTemporal InformationDifferent GroupsDeep LearningVideo RestorationVideo InterpretationComputer Vision
Video inpainting is a task of synthesizing spatio-temporal coherent content in missing regions of the given video sequence, which has recently drawn increasing attention. To utilize the temporal information across frames, most recent deep learning-based methods align reference frames to target frame firstly with explicit or implicit motion estimation and then integrate the information from the aligned frames. However, their performance relies heavily on the accuracy of frame-to-frame alignment. To alleviate the above problem, in this paper, a novel Temporal Group Fusion Network (TGF-Net) is proposed to effectively integrate temporal information through a two-stage fusion strategy. Specifically, the input frames are reorganized into different groups, where each group is followed by an intra-group fusion module to integrate information within the group. Different groups provide complementary information for the missing region. A temporal attention model is further designed to adaptively integrate the information across groups. Such a temporal information fusion way gets rid of the dependence on alignment operations, greatly improving the visual quality and temporal consistency of the inpainted results. In addition, a coarse alignment model is introduced at the beginning of the network to handle videos with large motion. Extensive experiments on DAVIS and Youtube-VOS datasets demonstrate the superiority of our proposed method in terms of PSNR/SSIM values, visual quality and temporal consistency, respectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1