Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint

Guangkun Nie; Gongzheng Tang; Shenda Hong

Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint

Guangkun Nie, Gongzheng Tang, Shenda Hong

TL;DR

The paper tackles imbalanced regression by formulating Dist Loss, a differentiable loss that minimizes distribution distance between predictions and true labels. It achieves this by estimating the label distribution with kernel density estimation, constructing pseudo-labels and pseudo-predictions, and employing fast differentiable sorting to compare distributions through a differentiable objective $L(\mathcal{S}_P,\mathcal{S}_L)$ alongside the standard regression loss. Empirical results on IMDB-WIKI-DIR, AgeDB-DIR, and ECG-K-DIR show state-of-the-art performance in few-shot and competitive results in median-shot regions, with Dist Loss offering complementary gains to existing methods. The approach demonstrates robustness across diverse domains and suggests a practical path to improving regression in highly skewed data settings, including healthcare applications. All mathematical notation in the paper is expressed with proper delimitation, e.g., discretization into $B$ bins, KDE-based label distributions $p_i$, and the distribution distance $L(\mathcal{S}_P,\mathcal{S}_L)$.

Abstract

Imbalanced data distributions are prevalent in real-world scenarios, posing significant challenges in both imbalanced classification and imbalanced regression tasks. They often cause deep learning models to overfit in areas of high sample density (many-shot regions) while underperforming in areas of low sample density (few-shot regions). This characteristic restricts the utility of deep learning models in various sectors, notably healthcare, where areas with few-shot data hold greater clinical relevance. While recent studies have shown the benefits of incorporating distribution information in imbalanced classification tasks, such strategies are rarely explored in imbalanced regression. In this paper, we address this issue by introducing a novel loss function, termed Dist Loss, designed to minimize the distribution distance between the model's predictions and the target labels in a differentiable manner, effectively integrating distribution information into model training. Dist Loss enables deep learning models to regularize their output distribution during training, effectively enhancing their focus on few-shot regions. We have conducted extensive experiments across three datasets spanning computer vision and healthcare: IMDB-WIKI-DIR, AgeDB-DIR, and ECG-Ka-DIR. The results demonstrate that Dist Loss effectively mitigates the negative impact of imbalanced data distribution on model performance, achieving state-of-the-art results in sparse data regions. Furthermore, Dist Loss is easy to integrate, complementing existing methods.

Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint

TL;DR

Abstract

Dist Loss: Enhancing Regression in Few-Shot Region through Distribution Distance Constraint

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)