Table of Contents
Fetching ...

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

Gabriele Magrini, Federico Becattini, Niccolò Biondi, Pietro Pala

TL;DR

This work tackles domain generalization for visual perception by leveraging event cameras as privileged information during training. PEPR reframes LUPI as a predictive latent-transfer task where an RGB encoder learns to predict event-based latents in a shared space, avoiding harmful direct alignment. The resulting RGB-only model shows improved robustness to day-to-night and other domain shifts across object detection and semantic segmentation, outperforming L2 feature alignment and maintaining strong in-domain performance. Practically, PEPR enables deployment of single-modality RGB models with enhanced resilience in real-world, shift-prone environments.

Abstract

Deep neural networks for visual perception are highly susceptible to domain shift, which poses a critical challenge for real-world deployment under conditions that differ from the training data. To address this domain generalization challenge, we propose a cross-modal framework under the learning using privileged information (LUPI) paradigm for training a robust, single-modality RGB model. We leverage event cameras as a source of privileged information, available only during training. The two modalities exhibit complementary characteristics: the RGB stream is semantically dense but domain-dependent, whereas the event stream is sparse yet more domain-invariant. Direct feature alignment between them is therefore suboptimal, as it forces the RGB encoder to mimic the sparse event representation, thereby losing semantic detail. To overcome this, we introduce Privileged Event-based Predictive Regularization (PEPR), which reframes LUPI as a predictive problem in a shared latent space. Instead of enforcing direct cross-modal alignment, we train the RGB encoder with PEPR to predict event-based latent features, distilling robustness without sacrificing semantic richness. The resulting standalone RGB model consistently improves robustness to day-to-night and other domain shifts, outperforming alignment-based baselines across object detection and semantic segmentation.

PEPR: Privileged Event-based Predictive Regularization for Domain Generalization

TL;DR

This work tackles domain generalization for visual perception by leveraging event cameras as privileged information during training. PEPR reframes LUPI as a predictive latent-transfer task where an RGB encoder learns to predict event-based latents in a shared space, avoiding harmful direct alignment. The resulting RGB-only model shows improved robustness to day-to-night and other domain shifts across object detection and semantic segmentation, outperforming L2 feature alignment and maintaining strong in-domain performance. Practically, PEPR enables deployment of single-modality RGB models with enhanced resilience in real-world, shift-prone environments.

Abstract

Deep neural networks for visual perception are highly susceptible to domain shift, which poses a critical challenge for real-world deployment under conditions that differ from the training data. To address this domain generalization challenge, we propose a cross-modal framework under the learning using privileged information (LUPI) paradigm for training a robust, single-modality RGB model. We leverage event cameras as a source of privileged information, available only during training. The two modalities exhibit complementary characteristics: the RGB stream is semantically dense but domain-dependent, whereas the event stream is sparse yet more domain-invariant. Direct feature alignment between them is therefore suboptimal, as it forces the RGB encoder to mimic the sparse event representation, thereby losing semantic detail. To overcome this, we introduce Privileged Event-based Predictive Regularization (PEPR), which reframes LUPI as a predictive problem in a shared latent space. Instead of enforcing direct cross-modal alignment, we train the RGB encoder with PEPR to predict event-based latent features, distilling robustness without sacrificing semantic richness. The resulting standalone RGB model consistently improves robustness to day-to-night and other domain shifts, outperforming alignment-based baselines across object detection and semantic segmentation.
Paper Structure (20 sections, 4 equations, 6 figures, 9 tables)

This paper contains 20 sections, 4 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Overview of our Privileged Event-based Predictive Regularization (PEPR) framework. (a) During training, we employ a predictive objective which constrains the RGB encoder to learn representations that can predict the latent features ($\hat{\mathbf{p}}$) from the privileged event encoder ($\mathbf{p}$). (b) At inference, the event encoder and predictors are discarded, resulting in a robust, single-modality RGB model for Domain Generalization.
  • Figure 2: Qualitative results on the FRED Challenging dataset. PEPR manages to improve the detection rate in adverse unseen conditions, whereas the L2 feature alignment is not as effective. Ground truth boxes are shown in blue, detections in green.
  • Figure 3: Qualitative results on the Cityscapes Adverse dataset. PEPR improves segmentation robustness, helping recover critical regions such as the sky and refining overall precision.
  • Figure 4: Qualitative results on FRED Challenging. Detections are shown in green, ground truth in blue.
  • Figure 5: Qualitative results on Hard-DSEC-DET. Detections are shown in green, ground truth in blue.
  • ...and 1 more figures