Summarize by Aili

Towards Real-world Event-guided Low-light Video Enhancement and Deblurring

🌈 Abstract

The paper addresses the novel research problem of event-guided low-light video enhancement and deblurring. The key contributions are:

Designing a hybrid camera system using beam splitters and constructing the RELED dataset containing low-light blurry images, normal sharp images, and event streams.
Developing a tailored framework for the task, consisting of two key modules:
1. Event-guided Deformable Temporal Feature Alignment (ED-TFA) module to effectively utilize event information for temporal alignment.
2. Spectral Filtering-based Cross-Modal Feature Enhancement (SFCM-FE) module to enhance structural details while reducing noise in low-light conditions.
Achieving significant performance improvement on the RELED dataset, surpassing both event-guided and frame-based methods.

1. What are the key challenges in capturing videos in low-light conditions?

Low levels of environmental illumination cause reduced visibility and long exposure times, leading to motion blur artifacts.
Images taken under low-light environments commonly exhibit both reduced visibility due to diminished illumination and blurring artifacts from dynamic motion.

2. What are the limitations of existing works that address low-light enhancement and motion deblurring as separate tasks?

Conducting these tasks in a cascaded manner often yields sub-optimal results.
It is essential to address the problem in a joint manner that considers both the occurrence of motion blur and the low-illumination scenario simultaneously.

3. How can event cameras help in addressing the joint task of low-light enhancement and motion deblurring?

Event cameras excel in capturing detailed motion information and scene details even in low-light conditions, offering benefits such as high dynamic range, low latency, and low power consumption.
Utilizing event cameras can provide practical solutions for jointly addressing the challenges of low-light enhancement and motion deblurring.

1. What are the limitations of existing datasets for event-guided low-level vision tasks?

Existing datasets either rely on synthetic generation of low-light images and events, or have low resolution and struggle to capture real-world blur.
There has been no attempt to simultaneously acquire synchronized low-light blurry images, normal sharp images, and corresponding event streams.

2. How did the authors construct the RELED dataset?

They designed a hybrid camera system using beam splitters to capture the required data modalities simultaneously.
The system comprises two high-resolution RGB cameras and one event camera, with one RGB camera capturing sharp images under normal-light conditions and the other capturing blurred images in low-light conditions.
This setup allows for the collection of low-light blurry images, normal-light sharp images, and low-light event streams without relying on synthetic data generation.

3. What are the key characteristics of the RELED dataset?

It is the first dataset to offer high-resolution images with real-world low-light blur and normal-light sharp images, along with the corresponding event streams.
The dataset consists of 42 urban scenes, including camera movement and moving objects, and is sized at 1024x768 resolution.

1. What are the key components of the proposed framework?

Event-guided Deformable Temporal Feature Alignment (ED-TFA) module: Effectively utilizes event information to perform temporal alignment of features across multiple scales.
Spectral Filtering-based Cross-Modal Feature Enhancement (SFCM-FE) module: Enhances structural details while reducing noise in low-light conditions by leveraging low-frequency information and cross-modal feature fusion.

2. How does the ED-TFA module work?

It performs deformable temporal alignment of frame and event features in a coarse-to-fine manner across multiple scales.
The module utilizes event information to aid in finding temporal correspondence, which is challenging in degraded low-light and blurred conditions.

3. What is the purpose of the SFCM-FE module?

In low-light conditions with significant noise in both frames and events, it aims to effectively reduce noise and accurately restore the main structural information of the scene.
It leverages the advantages of spectral filtering and cross-modal feature fusion to enhance low-frequency structural details while suppressing high-frequency noise.

1. How did the authors evaluate the proposed method?

They conducted experiments on the RELED dataset, which they constructed, as there was no existing dataset available for the joint task of low-light enhancement and motion deblurring.
They compared the performance of their method against various state-of-the-art frame-based and event-guided low-light enhancement, motion deblurring, and joint methods.

2. What were the key findings from the experimental results?

The proposed method significantly outperformed both frame-based and event-guided methods, achieving substantial gains in PSNR and SSIM metrics.
The authors' lightweight model (ours-s) also outperformed other networks while using a relatively small number of parameters.
The qualitative results demonstrated the superior performance of the proposed method, even in challenging scenarios with severe motion blur and low-illumination conditions.

3. What were the contributions of the individual modules in the proposed framework?

The ablation study showed that the ED-TFA module and the SFCM-FE module both contributed significantly to the overall performance improvement.
The SFCM-FE module, with its spectral filtering and cross-modal feature enhancement capabilities, was particularly effective in restoring structural details while reducing noise in low-light conditions.

Shared by Daniel Chen ·