Table of Contents
Fetching ...

Universal Multi-Domain Translation via Diffusion Routers

Duc Kieu, Kien Do, Tuan Hoang, Thao Minh Le, Tung Kieu, Dang Nguyen, Thin Nguyen

TL;DR

The paper tackles universal multi-domain translation (UMDT), aiming to translate between any pair of K domains using only K-1 central-domain paired datasets. It introduces Diffusion Router (DR), a single diffusion model conditioned on source and target domain labels to handle central↔non-central translations and, via a variational bound and Tweedie refinement, direct non-central translations. Empirical results on three new UMDT benchmarks show state-of-the-art performance for both indirect and direct translations while reducing sampling cost, and demonstrate the method's flexibility to more complex topologies. Collectively, DR offers a scalable, versatile diffusion-based framework for universal translation across multiple domains.

Abstract

Multi-domain translation (MDT) aims to learn translations between multiple domains, yet existing approaches either require fully aligned tuples or can only handle domain pairs seen in training, limiting their practicality and excluding many cross-domain mappings. We introduce universal MDT (UMDT), a generalization of MDT that seeks to translate between any pair of $K$ domains using only $K-1$ paired datasets with a central domain. To tackle this problem, we propose Diffusion Router (DR), a unified diffusion-based framework that models all central$\leftrightarrow$non-central translations with a single noise predictor conditioned on the source and target domain labels. DR enables indirect non-central translations by routing through the central domain. We further introduce a novel scalable learning strategy with a variational-bound objective and an efficient Tweedie refinement procedure to support direct non-central mappings. Through evaluation on three large-scale UMDT benchmarks, DR achieves state-of-the-art results for both indirect and direct translations, while lowering sampling cost and unlocking novel tasks such as sketch$\leftrightarrow$segmentation. These results establish DR as a scalable and versatile framework for universal translation across multiple domains.

Universal Multi-Domain Translation via Diffusion Routers

TL;DR

The paper tackles universal multi-domain translation (UMDT), aiming to translate between any pair of K domains using only K-1 central-domain paired datasets. It introduces Diffusion Router (DR), a single diffusion model conditioned on source and target domain labels to handle central↔non-central translations and, via a variational bound and Tweedie refinement, direct non-central translations. Empirical results on three new UMDT benchmarks show state-of-the-art performance for both indirect and direct translations while reducing sampling cost, and demonstrate the method's flexibility to more complex topologies. Collectively, DR offers a scalable, versatile diffusion-based framework for universal translation across multiple domains.

Abstract

Multi-domain translation (MDT) aims to learn translations between multiple domains, yet existing approaches either require fully aligned tuples or can only handle domain pairs seen in training, limiting their practicality and excluding many cross-domain mappings. We introduce universal MDT (UMDT), a generalization of MDT that seeks to translate between any pair of domains using only paired datasets with a central domain. To tackle this problem, we propose Diffusion Router (DR), a unified diffusion-based framework that models all centralnon-central translations with a single noise predictor conditioned on the source and target domain labels. DR enables indirect non-central translations by routing through the central domain. We further introduce a novel scalable learning strategy with a variational-bound objective and an efficient Tweedie refinement procedure to support direct non-central mappings. Through evaluation on three large-scale UMDT benchmarks, DR achieves state-of-the-art results for both indirect and direct translations, while lowering sampling cost and unlocking novel tasks such as sketchsegmentation. These results establish DR as a scalable and versatile framework for universal translation across multiple domains.

Paper Structure

This paper contains 42 sections, 16 equations, 14 figures, 10 tables.

Figures (14)

  • Figure 1: Illustration of conventional and universal multi-domain translation
  • Figure 2: Tweedie refinement with $n\in\left\{ 0,1,3,5,7\right\}$ on Faces-UMDT-Latent. Left: A conditional sample $x^{c}$ and a random target-domain sample $x^{j}$. Middle: A ground-truth noisy target-domain sample $x_{t}^{j}$ aligned with $x^{c}$ (not available during training). Right: Tweedie refinement progressively transforms $x_{t}^{j}\sim p\left(x_{t}^{j}\right)$ into $x_{t}^{j}\sim p\left(x_{t}^{j}|x^{c}\right)$ as $n$ increases.
  • Figure 3: Qualitative results on Shoes-UMDT and Faces-UMDT-Latent.
  • Figure 4: Qualitative results of our method and baselines on Faces-UMDT-Pixel.
  • Figure 5: Learning curves of finetuned dDR on Face-UMDT-Latent w.r.t. different number of Tweedie refinement steps $n\in\left\{ 0,1,3,5\right\}$. The task is Segment$\rightarrow$Sketch translation.
  • ...and 9 more figures