Decoupled Training: Return of Frustratingly Easy Multi-Domain Learning
Ximei Wang, Junwei Pan, Xingzhuo Guo, Dapeng Liu, Jie Jiang
TL;DR
Multi-domain learning (MDL) must contend with dataset bias across domains and domain domination by head-dominant domains. The authors propose Decoupled Training (D-Train), a tri-phase general-to-specific strategy built on a shared-bottom backbone: (1) Pre-train on all domains to learn a root model $(\psi_0,h_0)$, (2) Post-train by splitting into domain-specific heads while sharing the backbone, and (3) Fine-tune with a fixed backbone to achieve domain independence $(\widehat h_t)$. Across Office-Home, DomainNet, FMoW, and Amazon, D-Train outperforms domain-alignment and mixture-of-experts baselines, with consistent gains on both average and worst-domain metrics and the ability to plug into existing MDL methods. An online Tencent DSP deployment shows tangible gains in cost and GMV, underscoring practical impact. The method demonstrates that a decoupled, hyperparameter-free, stage-wise optimization can mitigate the seesaw effect in MDL, enhancing scalability and deployability.
Abstract
Multi-domain learning (MDL) aims to train a model with minimal average risk across multiple overlapping but non-identical domains. To tackle the challenges of dataset bias and domain domination, numerous MDL approaches have been proposed from the perspectives of seeking commonalities by aligning distributions to reduce domain gap or reserving differences by implementing domain-specific towers, gates, and even experts. MDL models are becoming more and more complex with sophisticated network architectures or loss functions, introducing extra parameters and enlarging computation costs. In this paper, we propose a frustratingly easy and hyperparameter-free multi-domain learning method named Decoupled Training (D-Train). D-Train is a tri-phase general-to-specific training strategy that first pre-trains on all domains to warm up a root model, then post-trains on each domain by splitting into multi-heads, and finally fine-tunes the heads by fixing the backbone, enabling decouple training to achieve domain independence. Despite its extraordinary simplicity and efficiency, D-Train performs remarkably well in extensive evaluations of various datasets from standard benchmarks to applications of satellite imagery and recommender systems.
