Concentration Distribution Learning from Label Distributions
Jiawei Tang, Yuheng Jia
TL;DR
This work addresses the limitation of label distribution learning (LDL), which omits absolute label intensities, by introducing background concentration $\mu$ to form concentration distributions $\boldsymbol{c}_d=[\boldsymbol{b},\mu]$. It proposes CDL-LD, a probabilistic-neural model where a dataset-driven confidence $\boldsymbol{e}=f(\boldsymbol{x}|\Theta)$ yields Dirichlet parameters $\boldsymbol{\alpha}=\boldsymbol{e}+\mathbf{1}_c$, from which the apparent distribution $\boldsymbol{p}$ is drawn and the concentration components are derived as $\boldsymbol{b}_i=\frac{e_i}{\sum_j e_j+c}$ and $\mu=\frac{c}{\sum_j e_j+c}$. The learning objective combines an adjusted MSE loss $\mathcal{L}_{AMSE}$ that accounts for both prediction error and Dirichlet variance, and a generalization bound via Rademacher complexity supporting learnability. Extensive experiments on 12 real-world LDL datasets, plus construction of the first CDL dataset SJA_c from SJAFFE, show that CDL-LD outperforms state-of-the-art LDL methods across multiple metrics and can reliably recover background concentrations, indicating strong practical utility for richer instance descriptions and downstream tasks.
Abstract
Label distribution learning (LDL) is an effective method to predict the relative label description degree (a.k.a. label distribution) of a sample. However, the label distribution is not a complete representation of an instance because it overlooks the absolute intensity of each label. Specifically, it's impossible to obtain the total description degree of hidden labels that not in the label space, which leads to the loss of information and confusion in instances. To solve the above problem, we come up with a new concept named background concentration to serve as the absolute description degree term of the label distribution and introduce it into the LDL process, forming the improved paradigm of concentration distribution learning. Moreover, we propose a novel model by probabilistic methods and neural networks to learn label distributions and background concentrations from existing LDL datasets. Extensive experiments prove that the proposed approach is able to extract background concentrations from label distributions while producing more accurate prediction results than the state-of-the-art LDL methods. The code is available in https://github.com/seutjw/CDL-LD.
