Scalable Label Distribution Learning for Multi-Label Classification
Xingyu Zhao, Yuexuan An, Lei Qi, Xin Geng
TL;DR
SLDL addresses scalable multi-label classification by embedding labels as low-dimensional Gaussian distributions in a latent space, capturing asymmetric label correlations through a probability transfer matrix. It learns a feature-to-embedding mapping with $\mathcal{L}(\boldsymbol{W}) = \| \boldsymbol{Z} - \boldsymbol{X}\boldsymbol{W} \|_F^2 + \alpha \|\boldsymbol{W}\|_F^2$ optimized via L-BFGS, and decodes predictions with a cosine-based nearest-neighbor mechanism that weights neighboring embeddings. The approach provides a theoretical bound linking embedding and regression errors to the final cost and demonstrates strong empirical performance across 15 large-scale MLC benchmarks, achieving both high accuracy and substantial speedups. By decoupling complexity from the number of labels and exploiting asymmetric label relations, SLDL offers a scalable and effective framework for real-world large-output-space MLC tasks.
Abstract
Multi-label classification (MLC) refers to the problem of tagging a given instance with a set of relevant labels. Most existing MLC methods are based on the assumption that the correlation of two labels in each label pair is symmetric, which is violated in many real-world scenarios. Moreover, most existing methods design learning processes associated with the number of labels, which makes their computational complexity a bottleneck when scaling up to large-scale output space. To tackle these issues, we propose a novel method named Scalable Label Distribution Learning (SLDL) for multi-label classification which can describe different labels as distributions in a latent space, where the label correlation is asymmetric and the dimension is independent of the number of labels. Specifically, SLDL first converts labels into continuous distributions within a low-dimensional latent space and leverages the asymmetric metric to establish the correlation between different labels. Then, it learns the mapping from the feature space to the latent space, resulting in the computational complexity is no longer related to the number of labels. Finally, SLDL leverages a nearest-neighbor-based strategy to decode the latent representations and obtain the final predictions. Extensive experiments illustrate that SLDL achieves very competitive classification performances with little computational consumption.
