Overspecified Mixture Discriminant Analysis: Exponential Convergence, Statistical Guarantees, and Remote Sensing Applications
Arman Bolatov, Alan Legg, Igor Melnykov, Amantay Nurlanuly, Maxat Tezekbayev, Zhenisbek Assylbekov
TL;DR
The paper analyzes overspecified MDA where an unbalanced two-component Gaussian mixture is fitted per class to data generated from a single Gaussian. It proves that, in the population limit, EM converges exponentially fast to the Bayes risk, and in finite samples, misclassification error achieves the optimal rate of $O(\sqrt{d/n})$ with $O(\log(n/d))$ EM iterations. The analysis hinges on KL divergence contraction and a radial Polyak–Łojasiewicz inequality on a hypersurface where variances are determined by the current location parameters, with extensions to learned variances and unbalanced weights. Empirical validation on remote sensing datasets (Salinas-A and EuroSAT) demonstrates practical benefits of overspecified MDA, improving classification boundaries and multimodal class separation, thereby providing a principled justification for using overspecified mixtures in complex data contexts.
Abstract
This study explores the classification error of Mixture Discriminant Analysis (MDA) in scenarios where the number of mixture components exceeds those present in the actual data distribution, a condition known as overspecification. We use a two-component Gaussian mixture model within each class to fit data generated from a single Gaussian, analyzing both the algorithmic convergence of the Expectation-Maximization (EM) algorithm and the statistical classification error. We demonstrate that, with suitable initialization, the EM algorithm converges exponentially fast to the Bayes risk at the population level. Further, we extend our results to finite samples, showing that the classification error converges to Bayes risk with a rate $n^{-1/2}$ under mild conditions on the initial parameter estimates and sample size. This work provides a rigorous theoretical framework for understanding the performance of overspecified MDA, which is often used empirically in complex data settings, such as image and text classification. To validate our theory, we conduct experiments on remote sensing datasets.
