Table of Contents
Fetching ...

DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining

Shuning Sun, Jialang Lu, Xiang Chen, Jichao Wang, Dianjie Lu, Guijuan Zhang, Guangwei Gao, Zhuoran Zheng

TL;DR

DeLiVR tackles rain-induced degradation in outdoor videos by enforcing geometry-aware spatiotemporal consistency. It injects spatiotemporal Lie-group differential biases directly into attention, combining a rotation-bounded SO(2) head with differential group displacement to align frames and model motion without relying on unreliable optical flow. The approach achieves state-of-the-art or competitive results on synthetic and real benchmarks, notably WeatherBench, while reducing artifacts and improving downstream task performance such as object detection and semantic segmentation. This work demonstrates that principled geometric priors integrated into attention offer robust, efficient video restoration with practical impact on real-world perception systems.

Abstract

Videos captured in the wild often suffer from rain streaks, blur, and noise. In addition, even slight changes in camera pose can amplify cross-frame mismatches and temporal artifacts. Existing methods rely on optical flow or heuristic alignment, which are computationally expensive and less robust. To address these challenges, Lie groups provide a principled way to represent continuous geometric transformations, making them well-suited for enforcing spatial and temporal consistency in video modeling. Building on this insight, we propose DeLiVR, an efficient video deraining method that injects spatiotemporal Lie-group differential biases directly into attention scores of the network. Specifically, the method introduces two complementary components. First, a rotation-bounded Lie relative bias predicts the in-plane angle of each frame using a compact prediction module, where normalized coordinates are rotated and compared with base coordinates to achieve geometry-consistent alignment before feature aggregation. Second, a differential group displacement computes angular differences between adjacent frames to estimate a velocity. This bias computation combines temporal decay and attention masks to focus on inter-frame relationships while precisely matching the direction of rain streaks. Extensive experimental results demonstrate the effectiveness of our method on publicly available benchmarks. The code is publicly available at https://github.com/Shuning0312/ICLR-DeLiVR.

DeLiVR: Differential Spatiotemporal Lie Bias for Efficient Video Deraining

TL;DR

DeLiVR tackles rain-induced degradation in outdoor videos by enforcing geometry-aware spatiotemporal consistency. It injects spatiotemporal Lie-group differential biases directly into attention, combining a rotation-bounded SO(2) head with differential group displacement to align frames and model motion without relying on unreliable optical flow. The approach achieves state-of-the-art or competitive results on synthetic and real benchmarks, notably WeatherBench, while reducing artifacts and improving downstream task performance such as object detection and semantic segmentation. This work demonstrates that principled geometric priors integrated into attention offer robust, efficient video restoration with practical impact on real-world perception systems.

Abstract

Videos captured in the wild often suffer from rain streaks, blur, and noise. In addition, even slight changes in camera pose can amplify cross-frame mismatches and temporal artifacts. Existing methods rely on optical flow or heuristic alignment, which are computationally expensive and less robust. To address these challenges, Lie groups provide a principled way to represent continuous geometric transformations, making them well-suited for enforcing spatial and temporal consistency in video modeling. Building on this insight, we propose DeLiVR, an efficient video deraining method that injects spatiotemporal Lie-group differential biases directly into attention scores of the network. Specifically, the method introduces two complementary components. First, a rotation-bounded Lie relative bias predicts the in-plane angle of each frame using a compact prediction module, where normalized coordinates are rotated and compared with base coordinates to achieve geometry-consistent alignment before feature aggregation. Second, a differential group displacement computes angular differences between adjacent frames to estimate a velocity. This bias computation combines temporal decay and attention masks to focus on inter-frame relationships while precisely matching the direction of rain streaks. Extensive experimental results demonstrate the effectiveness of our method on publicly available benchmarks. The code is publicly available at https://github.com/Shuning0312/ICLR-DeLiVR.

Paper Structure

This paper contains 43 sections, 23 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the problem and our solution. (a) Challenge: rain streaks show spatiotemporal dynamics with varying angles, making alignment unreliable. (b) Our approach: an SO(2) Head with exponential map ensures geometry-consistent alignment. (c) Failure case: optical flow is corrupted as brightness constancy breaks in rain.
  • Figure 2: Overall architecture of DeLiVR. The model restores clean video frames by estimating per-frame rotations, constructing spatial and temporal biases, and injecting them into biased self-attention for robust geometry-consistent and temporally reliable restoration.
  • Figure 3: Qualitative comparison with state-of-the-art methods on four benchmarks. From top to bottom, the rows show results on NTU, Rain-Syn-Light, Rain-Syn-Complex, and the real-world WeatherBench dataset. Compared to other methods, our model more effectively removes severe rain streaks and color casts, while better preserving fine background textures and natural colors.
  • Figure 4: Visual ablation of different bias components on the NTURain dataset. From left to right: baseline without Lie bias, model with spatial bias only (Bspace), model with temporal bias only (Btime), and the full DeLiVR with both spatial and temporal Lie-group differential biases.
  • Figure 5: Attention comparison between the baseline and the rotation-enhanced model. The top row shows full attention matrices for both models under the same input sequence. The second row visualizes spatial attention maps extracted from representative query positions. The right column reports aggregated attention statistics, including average entropy and maximum attention values. The bottom row presents the distribution of attention weights and attention entropy across all heads and layers.
  • ...and 6 more figures