Table of Contents
Fetching ...

LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation

Xiaoshan Wu, Xiaoyang Lyu, Yifei Yu, Bo Wang, Zhongrui Wang, Xiaojuan Qi

Abstract

Dense semantic segmentation in dynamic environments is fundamentally limited by the low-frame-rate (LFR) nature of standard cameras, which creates critical perceptual gaps between frames. To solve this, we introduce Anytime Interframe Semantic Segmentation: a new task for predicting segmentation at any arbitrary time using only a single past RGB frame and a stream of asynchronous event data. This task presents a core challenge: how to robustly propagate dense semantic features using a motion field derived from sparse and often noisy event data, all while mitigating feature degradation in highly dynamic scenes. We propose LiFR-Seg, a novel framework that directly addresses these challenges by propagating deep semantic features through time. The core of our method is an uncertainty-aware warping process, guided by an event-driven motion field and its learned, explicit confidence. A temporal memory attention module further ensures coherence in dynamic scenarios. We validate our method on the DSEC dataset and a new high-frequency synthetic benchmark (SHF-DSEC) we contribute. Remarkably, our LFR system achieves performance (73.82% mIoU on DSEC) that is statistically indistinguishable from an HFR upper-bound (within 0.09%) that has full access to the target frame. This work presents a new, efficient paradigm for achieving robust, high-frame-rate perception with low-frame-rate hardware. Project Page: https://candy-crusher.github.io/LiFR_Seg_Proj/#; Code: https://github.com/Candy-Crusher/LiFR-Seg.git.

LiFR-Seg: Anytime High-Frame-Rate Segmentation via Event-Guided Propagation

Abstract

Dense semantic segmentation in dynamic environments is fundamentally limited by the low-frame-rate (LFR) nature of standard cameras, which creates critical perceptual gaps between frames. To solve this, we introduce Anytime Interframe Semantic Segmentation: a new task for predicting segmentation at any arbitrary time using only a single past RGB frame and a stream of asynchronous event data. This task presents a core challenge: how to robustly propagate dense semantic features using a motion field derived from sparse and often noisy event data, all while mitigating feature degradation in highly dynamic scenes. We propose LiFR-Seg, a novel framework that directly addresses these challenges by propagating deep semantic features through time. The core of our method is an uncertainty-aware warping process, guided by an event-driven motion field and its learned, explicit confidence. A temporal memory attention module further ensures coherence in dynamic scenarios. We validate our method on the DSEC dataset and a new high-frequency synthetic benchmark (SHF-DSEC) we contribute. Remarkably, our LFR system achieves performance (73.82% mIoU on DSEC) that is statistically indistinguishable from an HFR upper-bound (within 0.09%) that has full access to the target frame. This work presents a new, efficient paradigm for achieving robust, high-frame-rate perception with low-frame-rate hardware. Project Page: https://candy-crusher.github.io/LiFR_Seg_Proj/#; Code: https://github.com/Candy-Crusher/LiFR-Seg.git.
Paper Structure (32 sections, 4 equations, 6 figures, 8 tables)

This paper contains 32 sections, 4 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Bridging the Perceptual Gap in High-Speed Scenarios. A critical "Blind Time Interval" for LFR systems is illustrated: (a) During $t$ to $t+\Delta t$, a pedestrian rapidly enters the vehicle's path. (b) A standard LFR system, constrained by discrete frames, Too Late detects danger only by $t+\Delta t$. (c) In stark contrast, our HFR Anytime System leverages continuous events to detect imminent danger at $t+\delta t$, providing crucial early warning and bridging this gap.
  • Figure 2: Overview of our LiFR-Seg framework.(a) The overall architecture. (b) The Splatting Module performs uncertainty-guided feature propagation using an event-driven motion field ($\hat{\mathbf{M}}$) and its learned confidence ($S$). (Note that $E_{t+\Delta t}$ is used strictly for training supervision to generate $Seg_{t+\Delta t}$.) (c) The Memory Attention module refines the propagated feature by integrating historical context for long-term consistency.
  • Figure 3: Perception Paradigm Comparison. Visual definition of the four experimental settings: (a) The LFR (Baseline), which is causal but not anytime-capable. (b)Interpolation-based methods, which are non-causal. (c) The original Event-Image Fusion paradigm. (d) Our (LiFR-Seg) framework, which is the only one that is both causal and anytime-capable.
  • Figure 4: Qualitative comparison of anytime interframe segmentation. The top row establishes the visual context, displaying the input RGB frame at time $t$, the event stream from $t$ to $t+\delta t$, and the target Ground Truth (GT) segmentation at time $t+\delta t$. The bottom row presents a zoomed-in comparison of the GT against the outputs of all evaluated methods.
  • Figure 5: Anytime performance on SHF-DSEC. Our method (solid blue) remains stable, while baselines degrade as the temporal gap $\delta t$ increases.
  • ...and 1 more figures