Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

TLDR

Deep neural networks have achieved high accuracy and performance in single‑image super‑resolution, yet most methods upscale low‑resolution inputs to high‑resolution space with bicubic interpolation before reconstruction, making the SR operation occur in HR space. The paper aims to introduce the first CNN that performs real‑time 1080p video super‑resolution on a single K2 GPU. The method uses a CNN that extracts features in LR space and employs an efficient sub‑pixel convolution layer that learns per‑feature upscaling filters to produce the HR output. The approach replaces the handcrafted bicubic filter with learned upscaling filters, reducing computational complexity and achieving 0.15 dB higher PSNR on images and 0.39 dB on videos while being an order of magnitude faster than prior CNN methods.

Abstract

Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

References

Page 1

	Year	Citations

Page 1