Table of Contents
Fetching ...

Bridging Day and Night: Target-Class Hallucination Suppression in Unpaired Image Translation

Shuwei Li, Lei Tan, Robby T. Tan

TL;DR

A novel framework that detects and suppresses hallucinations of target-class features during unpaired translation, built upon a Schrodinger Bridge-based translation model, and performs iterative refinement, where detected hallucination features are explicitly pushed away from class prototypes in feature space, thus preserving object semantics across the translation trajectory.

Abstract

Day-to-night unpaired image translation is important to downstream tasks but remains challenging due to large appearance shifts and the lack of direct pixel-level supervision. Existing methods often introduce semantic hallucinations, where objects from target classes such as traffic signs and vehicles, as well as man-made light effects, are incorrectly synthesized. These hallucinations significantly degrade downstream performance. We propose a novel framework that detects and suppresses hallucinations of target-class features during unpaired translation. To detect hallucination, we design a dual-head discriminator that additionally performs semantic segmentation to identify hallucinated content in background regions. To suppress these hallucinations, we introduce class-specific prototypes, constructed by aggregating features of annotated target-domain objects, which act as semantic anchors for each class. Built upon a Schrodinger Bridge-based translation model, our framework performs iterative refinement, where detected hallucination features are explicitly pushed away from class prototypes in feature space, thus preserving object semantics across the translation trajectory.Experiments show that our method outperforms existing approaches both qualitatively and quantitatively. On the BDD100K dataset, it improves mAP by 15.5% for day-to-night domain adaptation, with a notable 31.7% gain for classes such as traffic lights that are prone to hallucinations.

Bridging Day and Night: Target-Class Hallucination Suppression in Unpaired Image Translation

TL;DR

A novel framework that detects and suppresses hallucinations of target-class features during unpaired translation, built upon a Schrodinger Bridge-based translation model, and performs iterative refinement, where detected hallucination features are explicitly pushed away from class prototypes in feature space, thus preserving object semantics across the translation trajectory.

Abstract

Day-to-night unpaired image translation is important to downstream tasks but remains challenging due to large appearance shifts and the lack of direct pixel-level supervision. Existing methods often introduce semantic hallucinations, where objects from target classes such as traffic signs and vehicles, as well as man-made light effects, are incorrectly synthesized. These hallucinations significantly degrade downstream performance. We propose a novel framework that detects and suppresses hallucinations of target-class features during unpaired translation. To detect hallucination, we design a dual-head discriminator that additionally performs semantic segmentation to identify hallucinated content in background regions. To suppress these hallucinations, we introduce class-specific prototypes, constructed by aggregating features of annotated target-domain objects, which act as semantic anchors for each class. Built upon a Schrodinger Bridge-based translation model, our framework performs iterative refinement, where detected hallucination features are explicitly pushed away from class prototypes in feature space, thus preserving object semantics across the translation trajectory.Experiments show that our method outperforms existing approaches both qualitatively and quantitatively. On the BDD100K dataset, it improves mAP by 15.5% for day-to-night domain adaptation, with a notable 31.7% gain for classes such as traffic lights that are prone to hallucinations.
Paper Structure (19 sections, 10 equations, 5 figures, 3 tables)

This paper contains 19 sections, 10 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Existing unpaired image-to-image (I2I) translation methods often introduce hallucinations that mimic or falsely suggest target classes, such as fake taillights for cars or spurious green and red signals for traffic lights (see yellow boxes).
  • Figure 2: Model overview. Given a source image $x_0$, our framework models image-to-image translation as a multi-step transport process that progressively refines the translation. At each step $j \in [0,i]$, the generator processes the current state $x_{t_j}$ and predicts an intermediate target image $x_1(x_{t_j})$. The next state $x_{t_{j+1}}$ is obtained by mixing $x_{t_j}$, $x_1(x_{t_j})$, and Gaussian noise. This process continues until $x_{t_i}$, the final intermediate state before training. Hallucinations in the $x_1(x_{t_{i}})$ are detected by the segmentation head $D_{\text{seg}}$, while reference images provide annotated objects to generate target-class prototypes of the target domain. Intermediate hallucination suppression then pushes detected hallucinated features away from the prototypes.
  • Figure 3: Discriminator training. Left panel: The segmentation head $D_{\text{seg}}$ is trained using SAM2 ravi2024sam-generated pseudo labels, with the segmentation loss $L_{\text{seg}}$. Right panel: In the generator training stage, translated images are evaluated by the discriminator. Hallucinations are segmented by comparing the segmentation prediction with the annotations. The segmented hallucinations will be utilized by the intermediate hallucination suppression. The style head is not shown in this figure.
  • Figure 4: Qualitative comparison of day-to-night translation results on the BDD100K dataset. Yellow boxes highlight regions containing target-class hallucinations or inconsistencies with the original annotations, which may introduce label noise. Our method (e) preserves both the realistic nighttime style and semantic consistency with the input annotations, whereas existing state-of-the-art methods (b–d) often produce hallucinated objects or fail to align with the source semantics.
  • Figure 5: Ablation study. The fewest hallucinations are observed when both components are incorporated.