Adaptive kernel-density approach for imbalanced binary classification
Kotaro J. Nishimura, Yuichi Sakumura, Kazushi Ikeda
TL;DR
This work tackles severe class imbalance in binary classification by introducing KOTARO, a KDE-inspired classifier with density-adaptive bandwidths. For each sample, the local bandwidth is derived from $d_i=\max_{j\in\mathcal{N}_n(i)} d(i,j)$ and the kernel is $k(\mathbf{x},\mathbf{x}_i)=\exp(-\gamma_i\|\mathbf{x}-\mathbf{x}_i\|^2)$ with $\gamma_i=1/d_i$, forming a discriminant $f(\mathbf{x})=\sum_i w_i k(\mathbf{x}_i,\mathbf{x})$ where $\mathbf{w}=\mathbf{K}^{-1}\mathbf{y}$. This density-adaptive approach sharpens boundaries in high-density (majority) regions while expanding in sparse (minority) regions, improving minority detection under extreme imbalance. The method is validated on synthetic EI/DI datasets and real-world imbalanced medical data, with Boruta feature selection used to assess robustness to noisy features; results show superior performance under severe imbalance, especially for EI-type distributions, and reveal a two-phase strategy: use KOTARO on raw data and switch to ensemble or re-sampled methods after feature curation. The work highlights practical impact for critical domains such as medical diagnosis, where minority-class recognition is essential, and outlines future directions for automatic imbalance-type identification and broader distribution testing.
Abstract
Class imbalance is a common challenge in real-world binary classification tasks, often leading to predictions biased toward the majority class and reduced recognition of the minority class. This issue is particularly critical in domains such as medical diagnosis and anomaly detection, where correct classification of minority classes is essential. Conventional methods often fail to deliver satisfactory performance when the imbalance ratio is extremely severe. To address this challenge, we propose a novel approach called Kernel-density-Oriented Threshold Adjustment with Regional Optimization (KOTARO), which extends the framework of kernel density estimation (KDE) by adaptively adjusting decision boundaries according to local sample density. In KOTARO, the bandwidth of Gaussian basis functions is dynamically tuned based on the estimated density around each sample, thereby enhancing the classifier's ability to capture minority regions. We validated the effectiveness of KOTARO through experiments on both synthetic and real-world imbalanced datasets. The results demonstrated that KOTARO outperformed conventional methods, particularly under conditions of severe imbalance, highlighting its potential as a promising solution for a wide range of imbalanced classification problems
