Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing
David Perera, Victor Letzelter, Théo Mariotte, Adrien Cortés, Mickael Chen, Slim Essid, Gaël Richard
TL;DR
Annealed Multiple Choice Learning (aMCL) addresses ambiguity in conditional prediction by injecting deterministic annealing into MCL. It replaces hard Winner-takes-all with a Boltzmann softmin $q_{T}$ controlled by a temperature schedule $T(t)$ and optimizes a soft distortion $D(q,f)$, guiding each hypothesis toward soft barycenters and allowing phase transitions as $T$ decreases. Theoretical analysis links the dynamics to entropy-constrained optimization and rate-distortion theory, predicting merges at high temperature and splits (phase transitions) into subgroups along the rate-distortion curve $R_x(D^ullet)$. Empirically, aMCL achieves competitive distortion and SI-SDR on UCI and speech separation benchmarks, with improved robustness to initialization and favorable computational complexity relative to PIT, suggesting strong practical utility for conditional quantization tasks.
Abstract
We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL. MCL is a learning framework handling ambiguous tasks by predicting a small set of plausible hypotheses. These hypotheses are trained using the Winner-takes-all (WTA) scheme, which promotes the diversity of the predictions. However, this scheme may converge toward an arbitrarily suboptimal local minimum, due to the greedy nature of WTA. We overcome this limitation using annealing, which enhances the exploration of the hypothesis space during training. We leverage insights from statistical physics and information theory to provide a detailed description of the model training trajectory. Additionally, we validate our algorithm by extensive experiments on synthetic datasets, on the standard UCI benchmark, and on speech separation.
