Table of Contents
Fetching ...

Learning Triangular Distribution in Visual World

Ping Chen, Xingpeng Zhang, Chengtao Zhou, Dichao Fan, Peng Tu, Le Zhang, Yanlin Qian

TL;DR

The paper addresses the mismatch between nonlinear visual features and quasi-continuous labels in label distribution learning by introducing a parameter-free Triangular Distribution Transform (TDT) that creates an injective, linear mapping between feature differences and label differences. TDT uses a symmetric triangular distribution to approximate Gaussian feature differences and learns via a combination of symmetry, commutativity, and supervisory losses so that a linear head can predict labels from transformed features. A prior-sample contrastive-like mechanism guides learning and makes TDT a practical plug-in for standard backbones. Empirical results on facial age estimation, image aesthetics, and illumination estimation show competitive or superior performance compared with state-of-the-art methods, validating TDT’s effectiveness and simplicity for visual regression tasks. The approach offers a lightweight, interpretable pathway to linearize complex feature-label mappings in real-world vision problems.

Abstract

Convolution neural network is successful in pervasive vision tasks, including label distribution learning, which usually takes the form of learning an injection from the non-linear visual features to the well-defined labels. However, how the discrepancy between features is mapped to the label discrepancy is ambient, and its correctness is not guaranteed.To address these problems, we study the mathematical connection between feature and its label, presenting a general and simple framework for label distribution learning. We propose a so-called Triangular Distribution Transform (TDT) to build an injective function between feature and label, guaranteeing that any symmetric feature discrepancy linearly reflects the difference between labels. The proposed TDT can be used as a plug-in in mainstream backbone networks to address different label distribution learning tasks. Experiments on Facial Age Recognition, Illumination Chromaticity Estimation, and Aesthetics assessment show that TDT achieves on-par or better results than the prior arts.

Learning Triangular Distribution in Visual World

TL;DR

The paper addresses the mismatch between nonlinear visual features and quasi-continuous labels in label distribution learning by introducing a parameter-free Triangular Distribution Transform (TDT) that creates an injective, linear mapping between feature differences and label differences. TDT uses a symmetric triangular distribution to approximate Gaussian feature differences and learns via a combination of symmetry, commutativity, and supervisory losses so that a linear head can predict labels from transformed features. A prior-sample contrastive-like mechanism guides learning and makes TDT a practical plug-in for standard backbones. Empirical results on facial age estimation, image aesthetics, and illumination estimation show competitive or superior performance compared with state-of-the-art methods, validating TDT’s effectiveness and simplicity for visual regression tasks. The approach offers a lightweight, interpretable pathway to linearize complex feature-label mappings in real-world vision problems.

Abstract

Convolution neural network is successful in pervasive vision tasks, including label distribution learning, which usually takes the form of learning an injection from the non-linear visual features to the well-defined labels. However, how the discrepancy between features is mapped to the label discrepancy is ambient, and its correctness is not guaranteed.To address these problems, we study the mathematical connection between feature and its label, presenting a general and simple framework for label distribution learning. We propose a so-called Triangular Distribution Transform (TDT) to build an injective function between feature and label, guaranteeing that any symmetric feature discrepancy linearly reflects the difference between labels. The proposed TDT can be used as a plug-in in mainstream backbone networks to address different label distribution learning tasks. Experiments on Facial Age Recognition, Illumination Chromaticity Estimation, and Aesthetics assessment show that TDT achieves on-par or better results than the prior arts.
Paper Structure (17 sections, 7 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 17 sections, 7 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: The overall structure of our TDT. Our pipeline (b) diverges from the general pipeline (a) by incorporating a parameter-free TDT (Triangular Distribution Transform), enabling converting the nonlinear feature to the one which vary "linearly" as per Eq.\ref{['eq-F']}. Consequently, a linear head module alone suffices to establish the mapping between image features and their respective labels, with clear explanation which is missing for a conventional network head. For detailed information on the TDT loss, please refer to Figure.\ref{['fig2']}.
  • Figure 2: TDT is learned relying on the commutativity-related loss and symmetry-related loss, while the latter plays the primary role. The feature difference, associated with the symmetry-related loss, is used for result prediction. MSE is the mean square error.
  • Figure 3: Relationship between symmetric age with the feature discrepancy, predicted delta age, and symmetric loss. (a) feature discrepancy; (b) predicted delta age; (c) symmetric loss
  • Figure 4: Qualitative comparison of the final color correction result and those from individual prior samples. In the fourth column, the predictions from different priors (their color corrected images are given in right insets) are shown clustered around the ground truth location, with limited variance. In the rightmost column, prior samples are given.
  • Figure 5: Study on the symmetry learnt in $\mathcal{X}_s$. For any pair of age-$a$ image from test set and age-$b$ image from the prior set, we draw a histogram of the resulted $\mathcal{X}_s$, which in fact gather around the symmetry axis $(a+b)/2$, w.r.t. the histogram figure.