Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach
Pablo García-Santaclara, Bruno Fernández-Castro, Rebeca P. Díaz-Redondo
TL;DR
This paper tackles catastrophic forgetting in tabular data continual learning by proposing TRIL3, a rehearsal-based framework that uses XuILVQ to synthesize past-data prototypes and DNDF as an incremental classifier. The method synthesizes past-data prototypes without storing old samples and trains incrementally to adapt to non-stationary streams. Empirical results on four real-world datasets show that TRIL3 with around 50% synthetic data often matches or surpasses both replay-based baselines and offline training, while requiring less memory. This work broadens continual learning to tabular domains and offers practical impact for streaming, edge, and privacy-conscious applications.
Abstract
Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data.
