Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters

Sebastian Salwig; Till Kahlke; Florian Hirschberger; Dennis Forster; Jörg Lücke

Paper

Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters

Abstract

Gaussian Mixture Models (GMMs) range among the most frequently used models in machine learning. However, training large, general GMMs becomes computationally prohibitive for datasets that have many data points

of high-dimensionality

. For GMMs with arbitrary covariances, we here derive a highly efficient variational approximation, which is then integrated with mixtures of factor analyzers (MFAs). For GMMs with

components, our proposed algorithm substantially reduces runtime complexity from

per iteration to a complexity scaling linearly with

and sublinearly with

. In numerical experiments, we first validate that the complexity reduction results in a sublinear scaling for the entire GMM optimization process. Second, we show on large-scale benchmarks that the sublinear algorithm results in speed-ups of an order-of-magnitude compared to the state-of-the-art. Third, as a proof of concept, we finally train GMMs with over 10 billion parameters on about 100 million images, observing training times of less than nine hours on a single state-of-the-art CPU. Finally, and forth, we demonstrate the effectiveness of large-scale GMMs on the task of zero-shot image denoising, where sublinear training results in state-of-the-art denoising times while competitive denoising performance is maintained.