The generalized underlap coefficient with an application in clustering
Zhaoxi Zhang, Vanda Inacio, Sara Wade
TL;DR
The underlap coefficient (UNL), a multi-group separation measure, is generalized to multivariate variables and key properties of UNL are established and an explicit connection to the total variation is established.
Abstract
Quantifying distributional separation across groups is fundamental in statistical learning and scientific discovery, yet most classical discrepancy measures are tailored to two-group comparisons. We generalize the underlap coefficient (UNL), a multi-group separation measure, to multivariate variables. We establish key properties of UNL and provide an explicit connection to the total variation. We further interpret the UNL as a dependence measure between a group label and variables of interest and compare it with mutual information. We propose an importance sampling estimator of the UNL that can be combined with flexible density estimators. The utility of the UNL for assessing partition-covariate dependence in clustering is highlighted in detail, where it is particularly useful for evaluating the single-weights assumption in covariate-dependent mixture models. Finally we illustrate the application of the UNL in clustering using two real world datasets.
