Table of Contents
Fetching ...

Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach

Pablo García-Santaclara, Bruno Fernández-Castro, Rebeca P. Díaz-Redondo

TL;DR

This paper tackles catastrophic forgetting in tabular data continual learning by proposing TRIL3, a rehearsal-based framework that uses XuILVQ to synthesize past-data prototypes and DNDF as an incremental classifier. The method synthesizes past-data prototypes without storing old samples and trains incrementally to adapt to non-stationary streams. Empirical results on four real-world datasets show that TRIL3 with around 50% synthetic data often matches or surpasses both replay-based baselines and offline training, while requiring less memory. This work broadens continual learning to tabular domains and offers practical impact for streaming, edge, and privacy-conscious applications.

Abstract

Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.

Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach

TL;DR

This paper tackles catastrophic forgetting in tabular data continual learning by proposing TRIL3, a rehearsal-based framework that uses XuILVQ to synthesize past-data prototypes and DNDF as an incremental classifier. The method synthesizes past-data prototypes without storing old samples and trains incrementally to adapt to non-stationary streams. Empirical results on four real-world datasets show that TRIL3 with around 50% synthetic data often matches or surpasses both replay-based baselines and offline training, while requiring less memory. This work broadens continual learning to tabular domains and offers practical impact for streaming, edge, and privacy-conscious applications.

Abstract

Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.
Paper Structure (18 sections, 3 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: TRIL3 architecture and data flow
  • Figure 2: F1-Score class 0 and 1, CICIDS-2017 Friday dataset
  • Figure 3: Real data and prototypes, CICIDS-2017 Friday dataset