DNN-GDITD: Out-of-distribution detection via Deep Neural Network based Gaussian Descriptor for Imbalanced Tabular Data
Priyanka Chudasama, Anil Surisetty, Aakarsh Malhotra, Alok Singh
TL;DR
The paper tackles out-of-distribution detection under class imbalance in tabular data by proposing DNN-GDITD, a DNN-agnostic module that maps embeddings to $k$ independent Gaussian spheres and uses a four-term loss to create compact, discriminative ID clusters and reliable OOD signaling. Training interleaves updates to the base network and the Gaussian parameters $(\mu_i,\sigma_i)$ via Block Coordinate Descent, yielding a decision rule based on $\zeta_i(x)=\sigma_i-D_i(x)$ where an OOD sample satisfies $\zeta_i(x)<0$ for all $i$. Empirical results on four tabular datasets (Synthetic Financial, Gas Sensor, Drive Diagnosis, MNIST) across balanced and imbalanced settings demonstrate improved OOD detection metrics (TNR@85%TPR, AUROC, AUPR) with an average gain of about $3.32\%$ over baselines, while maintaining competitive ID accuracy. The method’s use of spherical boundaries and the four-loss combination shows robust performance and statistical significance (Wilcoxon tests), and suggests broad applicability to safety-critical domains and potential extension to other modalities.
Abstract
Classification tasks present challenges due to class imbalances and evolving data distributions. Addressing these issues requires a robust method to handle imbalances while effectively detecting out-of-distribution (OOD) samples not encountered during training. This study introduces a novel OOD detection algorithm designed for tabular datasets, titled Deep Neural Network-based Gaussian Descriptor for Imbalanced Tabular Data (DNN-GDITD). The DNN-GDITD algorithm can be placed on top of any DNN to facilitate better classification of imbalanced data and OOD detection using spherical decision boundaries. Using a combination of Push, Score-based, and focal losses, DNN-GDITD assigns confidence scores to test data points, categorizing them as known classes or as an OOD sample. Extensive experimentation on tabular datasets demonstrates the effectiveness of DNN-GDITD compared to three OOD algorithms. Evaluation encompasses imbalanced and balanced scenarios on diverse tabular datasets, including a synthetic financial dispute dataset and publicly available tabular datasets like Gas Sensor, Drive Diagnosis, and MNIST, showcasing DNN-GDITD's versatility.
