Concepedia

TLDR

Video frame interpolation that combines frame‑ and event‑based cameras outperforms traditional image‑only methods, yet existing approaches struggle with brittle image‑level fusion, temporally inconsistent motion estimation, low‑contrast event sparsity, and have only been evaluated on planar, far‑away scenes. This study aims to overcome these limitations by introducing multi‑scale feature‑level fusion and a one‑shot nonlinear inter‑frame motion model that can be efficiently sampled for image warping from events and images. The authors collect the first large‑scale dataset of over 100 challenging scenes with depth variations using a beamsplitter‑based setup, and employ the proposed multi‑scale fusion and one‑shot motion estimation to perform interpolation. The method achieves up to 0.2 dB higher PSNR and 15 % better LPIPS scores compared to prior techniques.

Abstract

Recently, video frame interpolation using a combination of frame- and event-based cameras has surpassed traditional image-based methods both in terms of performance and memory efficiency. However, current methods still suffer from (i) brittle image-level fusion of complementary interpolation results, that fails in the presence of artifacts in the fused image, (ii) potentially temporally inconsistent and inefficient motion estimation procedures, that run for every inserted frame and (iii) low contrast regions that do not trigger events, and thus cause events-only motion estimation to generate artifacts. Moreover, previous methods were only tested on datasets consisting of planar and far-away scenes, which do not capture the full complexity of the real world. In this work, we address the above problems by introducing multi-scale feature-level fusion and computing one-shot non-linear inter-frame motion-which can be efficiently sampled for image warping-from events and images. We also collect the first large-scale events and frames dataset consisting of more than 100 challenging scenes with depth variations, captured with a new experimental setup based on a beamsplitter. We show that our method improves the reconstruction quality by up to 0.2 dB in terms of PSNR and up to 15% in LPIPS score.

References

YearCitations

Page 1