Concepedia

Publication | Closed Access

Edge-Based Video Compression Texture Synthesis Using Generative Adversarial Network

13

Citations

45

References

2022

Year

TLDR

Texture patterns with abundant high‑frequency components such as grass and water mask distortions, making them hard to perceive, yet modern video codecs still allocate many bits to encode these textures, prompting the exploration of deep‑learning based texture synthesis to improve perceptual coding. This work introduces a deep texture synthesizer called the edge‑based texture synthesis framework (ETSF). ETSF first detects texture regions and coarsely quantizes them, then extracts low‑level edge features, removes spatiotemporal redundancy, compresses the processed edges, and finally reconstructs realistic textures at the decoder using an edge‑based texture synthesis GAN that injects edge information into convolutional layers. Experimental results on a texture dataset show that ETSF reduces MOS BD‑rate by an average of 12.8 %, 14.2 %, and 9.6 % under VVC low‑delay B, low‑delay P, and random‑access configurations, respectively.

Abstract

It has been recognized that texture patterns with abundant high-frequency components, such as grass and water, produce visual masking effects, and the distortion in textures is hard to be perceived by human eyes than structure regions. However, modern video codecs in a rate-distortion optimized manner usually consume a lot of bits to encode textures, leading to the insufficiency in perceptual coding performance. Nowadays, with the rapid development of deep learning, learning based texture synthesis methods have been proposed to replace the coding process of prediction residuals to reduce the rate cost. In this paper, we present a deep texture synthesizer named edge-based texture synthesis framework (ETSF). At encoder side, the framework detects texture regions by semantic and fidelity classification criteria, and the detected regions are quantized coarsely by the hybrid coding framework. In texture characterization, ETSF extracts low-level edge features representing pixel intensity variation. Feature processing tools are developed to remove the spatiotemporal redundancy of edges. The processed edge information is compressed and transmitted. To effectively recover textures, we design an edge-based texture synthesis generative adversarial network (ETSGAN) at the decoder of ETSF, which can incorporate edge information into convolutional layers and generate realistic textures. Experimental results on a collected texture dataset show that the proposed ETSF can achieve an average of -12.8%, -14.2% and -9.6% MOS BD-rate under lowdelay_B, lowdelay_P and random_access configurations of VVC coding, respectively.

References

YearCitations

Page 1