Table of Contents
Fetching ...

DNN-GDITD: Out-of-distribution detection via Deep Neural Network based Gaussian Descriptor for Imbalanced Tabular Data

Priyanka Chudasama, Anil Surisetty, Aakarsh Malhotra, Alok Singh

TL;DR

The paper tackles out-of-distribution detection under class imbalance in tabular data by proposing DNN-GDITD, a DNN-agnostic module that maps embeddings to $k$ independent Gaussian spheres and uses a four-term loss to create compact, discriminative ID clusters and reliable OOD signaling. Training interleaves updates to the base network and the Gaussian parameters $(\mu_i,\sigma_i)$ via Block Coordinate Descent, yielding a decision rule based on $\zeta_i(x)=\sigma_i-D_i(x)$ where an OOD sample satisfies $\zeta_i(x)<0$ for all $i$. Empirical results on four tabular datasets (Synthetic Financial, Gas Sensor, Drive Diagnosis, MNIST) across balanced and imbalanced settings demonstrate improved OOD detection metrics (TNR@85%TPR, AUROC, AUPR) with an average gain of about $3.32\%$ over baselines, while maintaining competitive ID accuracy. The method’s use of spherical boundaries and the four-loss combination shows robust performance and statistical significance (Wilcoxon tests), and suggests broad applicability to safety-critical domains and potential extension to other modalities.

Abstract

Classification tasks present challenges due to class imbalances and evolving data distributions. Addressing these issues requires a robust method to handle imbalances while effectively detecting out-of-distribution (OOD) samples not encountered during training. This study introduces a novel OOD detection algorithm designed for tabular datasets, titled Deep Neural Network-based Gaussian Descriptor for Imbalanced Tabular Data (DNN-GDITD). The DNN-GDITD algorithm can be placed on top of any DNN to facilitate better classification of imbalanced data and OOD detection using spherical decision boundaries. Using a combination of Push, Score-based, and focal losses, DNN-GDITD assigns confidence scores to test data points, categorizing them as known classes or as an OOD sample. Extensive experimentation on tabular datasets demonstrates the effectiveness of DNN-GDITD compared to three OOD algorithms. Evaluation encompasses imbalanced and balanced scenarios on diverse tabular datasets, including a synthetic financial dispute dataset and publicly available tabular datasets like Gas Sensor, Drive Diagnosis, and MNIST, showcasing DNN-GDITD's versatility.

DNN-GDITD: Out-of-distribution detection via Deep Neural Network based Gaussian Descriptor for Imbalanced Tabular Data

TL;DR

The paper tackles out-of-distribution detection under class imbalance in tabular data by proposing DNN-GDITD, a DNN-agnostic module that maps embeddings to independent Gaussian spheres and uses a four-term loss to create compact, discriminative ID clusters and reliable OOD signaling. Training interleaves updates to the base network and the Gaussian parameters via Block Coordinate Descent, yielding a decision rule based on where an OOD sample satisfies for all . Empirical results on four tabular datasets (Synthetic Financial, Gas Sensor, Drive Diagnosis, MNIST) across balanced and imbalanced settings demonstrate improved OOD detection metrics (TNR@85%TPR, AUROC, AUPR) with an average gain of about over baselines, while maintaining competitive ID accuracy. The method’s use of spherical boundaries and the four-loss combination shows robust performance and statistical significance (Wilcoxon tests), and suggests broad applicability to safety-critical domains and potential extension to other modalities.

Abstract

Classification tasks present challenges due to class imbalances and evolving data distributions. Addressing these issues requires a robust method to handle imbalances while effectively detecting out-of-distribution (OOD) samples not encountered during training. This study introduces a novel OOD detection algorithm designed for tabular datasets, titled Deep Neural Network-based Gaussian Descriptor for Imbalanced Tabular Data (DNN-GDITD). The DNN-GDITD algorithm can be placed on top of any DNN to facilitate better classification of imbalanced data and OOD detection using spherical decision boundaries. Using a combination of Push, Score-based, and focal losses, DNN-GDITD assigns confidence scores to test data points, categorizing them as known classes or as an OOD sample. Extensive experimentation on tabular datasets demonstrates the effectiveness of DNN-GDITD compared to three OOD algorithms. Evaluation encompasses imbalanced and balanced scenarios on diverse tabular datasets, including a synthetic financial dispute dataset and publicly available tabular datasets like Gas Sensor, Drive Diagnosis, and MNIST, showcasing DNN-GDITD's versatility.
Paper Structure (13 sections, 12 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 12 equations, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: Sign of score $\zeta_i(x) = \sigma_i - D_i(x)$ with respect to different clusters $i \in \{1,2,3\}$ when $x$ is an ID sample (case (a)) vs when x is an OOD sample (case (b)). $(\mu_i, \sigma_i)$ represent each clusters centre and radius respectively. Score wrt the true class should be positive, whereas score wrt the rest of classes should be negative. Thus, in case (b) as $x$ is an OOD sample as $\zeta_i(x) <0$ for all clusters.
  • Figure 2: Graphical comparison of Softmax softmax, Mahalanobis lee2018simple, Deep-MCDD deep_mcdd vs DNN-GDITD (ours) on publicly available dataset Gas Sensor data_gas, Drive Diagnosis data_drive and MNIST data_mnist.