Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
๐ Abstract
The paper addresses the novel research problem of event-guided low-light video enhancement and deblurring. The key contributions are:
- Designing a hybrid camera system using beam splitters and constructing the RELED dataset containing low-light blurry images, normal sharp images, and event streams.
- Developing a tailored framework for the task, consisting of two key modules:
- Event-guided Deformable Temporal Feature Alignment (ED-TFA) module to effectively utilize event information for temporal alignment.
- Spectral Filtering-based Cross-Modal Feature Enhancement (SFCM-FE) module to enhance structural details while reducing noise in low-light conditions.
- Achieving significant performance improvement on the RELED dataset, surpassing both event-guided and frame-based methods.
๐ Q&A
[01] Introduction
1. What are the key challenges in capturing videos in low-light conditions?
- Low levels of environmental illumination cause reduced visibility and long exposure times, leading to motion blur artifacts.
- Images taken under low-light environments commonly exhibit both reduced visibility due to diminished illumination and blurring artifacts from dynamic motion.
2. What are the limitations of existing works that address low-light enhancement and motion deblurring as separate tasks?
- Conducting these tasks in a cascaded manner often yields sub-optimal results.
- It is essential to address the problem in a joint manner that considers both the occurrence of motion blur and the low-illumination scenario simultaneously.
3. How can event cameras help in addressing the joint task of low-light enhancement and motion deblurring?
- Event cameras excel in capturing detailed motion information and scene details even in low-light conditions, offering benefits such as high dynamic range, low latency, and low power consumption.
- Utilizing event cameras can provide practical solutions for jointly addressing the challenges of low-light enhancement and motion deblurring.
[02] RELED Dataset
1. What are the limitations of existing datasets for event-guided low-level vision tasks?
- Existing datasets either rely on synthetic generation of low-light images and events, or have low resolution and struggle to capture real-world blur.
- There has been no attempt to simultaneously acquire synchronized low-light blurry images, normal sharp images, and corresponding event streams.
2. How did the authors construct the RELED dataset?
- They designed a hybrid camera system using beam splitters to capture the required data modalities simultaneously.
- The system comprises two high-resolution RGB cameras and one event camera, with one RGB camera capturing sharp images under normal-light conditions and the other capturing blurred images in low-light conditions.
- This setup allows for the collection of low-light blurry images, normal-light sharp images, and low-light event streams without relying on synthetic data generation.
3. What are the key characteristics of the RELED dataset?
- It is the first dataset to offer high-resolution images with real-world low-light blur and normal-light sharp images, along with the corresponding event streams.
- The dataset consists of 42 urban scenes, including camera movement and moving objects, and is sized at 1024x768 resolution.
[03] Proposed Methods
1. What are the key components of the proposed framework?
- Event-guided Deformable Temporal Feature Alignment (ED-TFA) module: Effectively utilizes event information to perform temporal alignment of features across multiple scales.
- Spectral Filtering-based Cross-Modal Feature Enhancement (SFCM-FE) module: Enhances structural details while reducing noise in low-light conditions by leveraging low-frequency information and cross-modal feature fusion.
2. How does the ED-TFA module work?
- It performs deformable temporal alignment of frame and event features in a coarse-to-fine manner across multiple scales.
- The module utilizes event information to aid in finding temporal correspondence, which is challenging in degraded low-light and blurred conditions.
3. What is the purpose of the SFCM-FE module?
- In low-light conditions with significant noise in both frames and events, it aims to effectively reduce noise and accurately restore the main structural information of the scene.
- It leverages the advantages of spectral filtering and cross-modal feature fusion to enhance low-frequency structural details while suppressing high-frequency noise.
[04] Experiments
1. How did the authors evaluate the proposed method?
- They conducted experiments on the RELED dataset, which they constructed, as there was no existing dataset available for the joint task of low-light enhancement and motion deblurring.
- They compared the performance of their method against various state-of-the-art frame-based and event-guided low-light enhancement, motion deblurring, and joint methods.
2. What were the key findings from the experimental results?
- The proposed method significantly outperformed both frame-based and event-guided methods, achieving substantial gains in PSNR and SSIM metrics.
- The authors' lightweight model (ours-s) also outperformed other networks while using a relatively small number of parameters.
- The qualitative results demonstrated the superior performance of the proposed method, even in challenging scenarios with severe motion blur and low-illumination conditions.
3. What were the contributions of the individual modules in the proposed framework?
- The ablation study showed that the ED-TFA module and the SFCM-FE module both contributed significantly to the overall performance improvement.
- The SFCM-FE module, with its spectral filtering and cross-modal feature enhancement capabilities, was particularly effective in restoring structural details while reducing noise in low-light conditions.