Error Distribution Smoothing:Advancing Low-Dimensional Imbalanced Regression
Donghe Chen, Jiaxuan Yue, Tengjie Zheng, Lanxuan Wang, Lin Cheng
TL;DR
The paper tackles imbalanced regression by introducing Complexity-to-Density Ratio (CDR) to quantify regionwise imbalance and proposing Error Distribution Smoothing (EDS) to construct a representative dataset that preserves high-complexity regions while reducing redundancy in overrepresented areas. It leverages Delaunay triangulation and Linear Interpolation Models to approximate CDR and guide dataset selection, resulting in a Log-CDR distribution that informs region categorization. Empirical results across the Lorenz system with SINDy, high-dimensional polar moment data, and real-world Cartpole and Quadcopter tasks show that EDS improves predictive precision, reduces maximum errors, and speeds up training through a more balanced and informative dataset. Collectively, these contributions offer a principled approach to imbalanced regression with practical impact for scientific and engineering applications where data are sparse in complex regions yet abundant elsewhere.
Abstract
In real-world regression tasks, datasets frequently exhibit imbalanced distributions, characterized by a scarcity of data in high-complexity regions and an abundance in low-complexity areas. This imbalance presents significant challenges for existing classification methods with clear class boundaries, while highlighting a scarcity of approaches specifically designed for imbalanced regression problems. To better address these issues, we introduce a novel concept of Imbalanced Regression, which takes into account both the complexity of the problem and the density of data points, extending beyond traditional definitions that focus only on data density. Furthermore, we propose Error Distribution Smoothing (EDS) as a solution to tackle imbalanced regression, effectively selecting a representative subset from the dataset to reduce redundancy while maintaining balance and representativeness. Through several experiments, EDS has shown its effectiveness, and the related code and dataset can be accessed at https://anonymous.4open.science/r/Error-Distribution-Smoothing-762F.
