Table of Contents
Fetching ...

SPRINT: Semi-supervised Prototypical Representation for Few-Shot Class-Incremental Tabular Learning

Umid Suleymanov, Murat Kantarcioglu, Kevin S Chan, Michael De Lucia, Kevin Hamlen, Latifur Khan, Sharad Mehrotra, Ananthram Swami, Bhavani Thuraisingham

TL;DR

SPRINT introduces a mixed episodic training strategy that leverages confidence-based pseudo-labeling to enrich novel class representations and exploits low storage costs to retain base class history, and achieves a state-of-the-art average accuracy.

Abstract

Real-world systems must continuously adapt to novel concepts from limited data without forgetting previously acquired knowledge. While Few-Shot Class-Incremental Learning (FSCIL) is established in computer vision, its application to tabular domains remains largely unexplored. Unlike images, tabular streams (e.g., logs, sensors) offer abundant unlabeled data, a scarcity of expert annotations and negligible storage costs, features ignored by existing vision-based methods that rely on restrictive buffers. We introduce SPRINT, the first FSCIL framework tailored for tabular distributions. SPRINT introduces a mixed episodic training strategy that leverages confidence-based pseudo-labeling to enrich novel class representations and exploits low storage costs to retain base class history. Extensive evaluation across six diverse benchmarks spanning cybersecurity, healthcare, and ecological domains, demonstrates SPRINT's cross-domain robustness. It achieves a state-of-the-art average accuracy of 77.37% (5-shot), outperforming the strongest incremental baseline by 4.45%.

SPRINT: Semi-supervised Prototypical Representation for Few-Shot Class-Incremental Tabular Learning

TL;DR

SPRINT introduces a mixed episodic training strategy that leverages confidence-based pseudo-labeling to enrich novel class representations and exploits low storage costs to retain base class history, and achieves a state-of-the-art average accuracy.

Abstract

Real-world systems must continuously adapt to novel concepts from limited data without forgetting previously acquired knowledge. While Few-Shot Class-Incremental Learning (FSCIL) is established in computer vision, its application to tabular domains remains largely unexplored. Unlike images, tabular streams (e.g., logs, sensors) offer abundant unlabeled data, a scarcity of expert annotations and negligible storage costs, features ignored by existing vision-based methods that rely on restrictive buffers. We introduce SPRINT, the first FSCIL framework tailored for tabular distributions. SPRINT introduces a mixed episodic training strategy that leverages confidence-based pseudo-labeling to enrich novel class representations and exploits low storage costs to retain base class history. Extensive evaluation across six diverse benchmarks spanning cybersecurity, healthcare, and ecological domains, demonstrates SPRINT's cross-domain robustness. It achieves a state-of-the-art average accuracy of 77.37% (5-shot), outperforming the strongest incremental baseline by 4.45%.
Paper Structure (28 sections, 17 equations, 10 figures, 11 tables)

This paper contains 28 sections, 17 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Schematic overview of the SPRINT framework.
  • Figure 2: Session-wise accuracy comparison on the a) CICIDS2017 and b) MNIST datasets. The annotation highlights the performance margin of SPRINT over the second-best method in the final incremental session.
  • Figure 3: Distribution of Top-1 Accuracy (Final Session). Kernel Density Estimate (KDE) and histogram comparison of SPRINT vs. iCaRL over 30 independent runs on the ACI-IoT-2023 dataset. SPRINT exhibits a sharp, high-confidence peak around 93.6%, demonstrating superior stability and minimal variance compared to the broader distribution of iCaRL.
  • Figure 4: Design Analysis and Ablation Studies. (a) Impact of distance metrics on classification accuracy. (b) Evaluation in non-incremental setting. (c) Sensitivity of the loss balancing term $\beta$. (d) Influence of the number of pseudo-samples on Top-1 accuracy.
  • Figure 5: Sensitivity and Component Analysis. (a) Influence of the memory budget $M^{(0)}$ on final retention. (b) model accuracy vs shot number. (c) Breakdown of pseudo-label accuracy, demonstrating the effectiveness of the semi-supervised component.
  • ...and 5 more figures