Table of Contents
Fetching ...

FD-DB: Frequency-Decoupled Dual-Branch Network for Unpaired Synthetic-to-Real Domain Translation

Chuanhai Zang, Jiabao Hu, XW Song

TL;DR

This paper addresses the domain gap in unpaired synthetic-to-real translation for geometry-sensitive vision tasks, where purely photorealistic translation can distort content and degrade label inheritance. It introduces FD-DB, a frequency-decomposed dual-branch generator that splits appearance transfer into a low-frequency, interpretable parameter-editing base and a high-frequency residual branch for detail augmentation, fused through a gated mechanism. A frequency-domain residual injection strategy with high-pass constraints and multi-scale low-frequency anchoring, plus a two-stage training schedule, stabilizes optimization and preserves geometric integrity. Experiments on the YCB-V dataset show that FD-DB improves real-domain appearance consistency and substantially boosts downstream semantic segmentation, approaching the performance of models trained on real data with limited real-data fine-tuning.

Abstract

Synthetic data provide low-cost, accurately annotated samples for geometry-sensitive vision tasks, but appearance and imaging differences between synthetic and real domains cause severe domain shift and degrade downstream performance. Unpaired synthetic-to-real translation can reduce this gap without paired supervision, yet existing methods often face a trade-off between photorealism and structural stability: unconstrained generation may introduce deformation or spurious textures, while overly rigid constraints limit adaptation to real-domain statistics. We propose FD-DB, a frequency-decoupled dual-branch model that separates appearance transfer into low-frequency interpretable editing and high-frequency residual compensation. The interpretable branch predicts physically meaningful editing parameters (white balance, exposure, contrast, saturation, blur, and grain) to build a stable low-frequency appearance base with strong content preservation. The free branch complements fine details through residual generation, and a gated fusion mechanism combines the two branches under explicit frequency constraints to limit low-frequency drift. We further adopt a two-stage training schedule that first stabilizes the editing branch and then releases the residual branch to improve optimization stability. Experiments on the YCB-V dataset show that FD-DB improves real-domain appearance consistency and significantly boosts downstream semantic segmentation performance while preserving geometric and semantic structures.

FD-DB: Frequency-Decoupled Dual-Branch Network for Unpaired Synthetic-to-Real Domain Translation

TL;DR

This paper addresses the domain gap in unpaired synthetic-to-real translation for geometry-sensitive vision tasks, where purely photorealistic translation can distort content and degrade label inheritance. It introduces FD-DB, a frequency-decomposed dual-branch generator that splits appearance transfer into a low-frequency, interpretable parameter-editing base and a high-frequency residual branch for detail augmentation, fused through a gated mechanism. A frequency-domain residual injection strategy with high-pass constraints and multi-scale low-frequency anchoring, plus a two-stage training schedule, stabilizes optimization and preserves geometric integrity. Experiments on the YCB-V dataset show that FD-DB improves real-domain appearance consistency and substantially boosts downstream semantic segmentation, approaching the performance of models trained on real data with limited real-data fine-tuning.

Abstract

Synthetic data provide low-cost, accurately annotated samples for geometry-sensitive vision tasks, but appearance and imaging differences between synthetic and real domains cause severe domain shift and degrade downstream performance. Unpaired synthetic-to-real translation can reduce this gap without paired supervision, yet existing methods often face a trade-off between photorealism and structural stability: unconstrained generation may introduce deformation or spurious textures, while overly rigid constraints limit adaptation to real-domain statistics. We propose FD-DB, a frequency-decoupled dual-branch model that separates appearance transfer into low-frequency interpretable editing and high-frequency residual compensation. The interpretable branch predicts physically meaningful editing parameters (white balance, exposure, contrast, saturation, blur, and grain) to build a stable low-frequency appearance base with strong content preservation. The free branch complements fine details through residual generation, and a gated fusion mechanism combines the two branches under explicit frequency constraints to limit low-frequency drift. We further adopt a two-stage training schedule that first stabilizes the editing branch and then releases the residual branch to improve optimization stability. Experiments on the YCB-V dataset show that FD-DB improves real-domain appearance consistency and significantly boosts downstream semantic segmentation performance while preserving geometric and semantic structures.
Paper Structure (16 sections, 4 equations, 13 figures, 2 tables)

This paper contains 16 sections, 4 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Model architecture.
  • Figure 2: Architecture of frequency decomposition and reconstruction.
  • Figure 3: Generator: Interpretable Editing Branch (Parameter-Editing Branch) $G_{\text{edit}}$.
  • Figure 4: Generator: Free Residual Branch $G_{\text{free}}$.
  • Figure 5: Discriminator architecture.
  • ...and 8 more figures