Outlier-Robust Multi-Group Gaussian Mixture Modeling with Flexible Group Reassignment
Patricia Puchhammer, Ines Wilms, Peter Filzmoser
Abstract
Do expert-defined or diagnostically-labeled data groups align with clusters inferred through statistical modeling? If not, where do discrepancies between predefined labels and model-based groupings occur and why? In this work, we introduce the multi-group Gaussian mixture model (MG-GMM), the first model developed to investigate these questions. It incorporates prior group information while allowing flexibility to reassign observations to alternative groups based on data-driven evidence. We achieve this by modeling the observations of each group as arising not from a single distribution, but from a Gaussian mixture comprising all group-specific distributions. Moreover, our model offers robustness against cellwise outliers that may obscure or distort the underlying group structure. We propose a novel penalized likelihood approach, called cellMG-GMM, to jointly estimate mixture probabilities, location and scale parameters of the MG-GMM, and detect outliers through a penalty term on the number of flagged cellwise outliers in the objective function. We show that our estimator has good breakdown properties in presence of cellwise outliers. We develop a computationally-efficient EM-based algorithm for cellMG-GMM, and demonstrate its strong performance in identifying and diagnosing observations at the intersection of multiple groups through simulations and diverse applications in medicine and oenology.
