Table of Contents
Fetching ...

Continual Learning for non-stationary regression via Memory-Efficient Replay

Pablo García-Santaclara, Bruno Fernández-Castro, RebecaP. Díaz-Redondo, Martín Alonso-Gamarra

TL;DR

The paper tackles non-stationary regression in streaming data and the problem of catastrophic forgetting. It introduces a memory-efficient continual regression framework that extends TRIL3 with a Decision Tree Regressor to create virtual labels and a Mixture Density Network for online regression, powered by XuILVQ prototypes to generate synthetic data without storing raw samples. Across multiple tabular datasets, the method achieves competitive accuracy against offline baselines and substantially reduces forgetting compared with replay and CLeaR approaches. Memory analysis shows the prototype bank remains compact (memory-to-data ratios well below 2%), making it suitable for resource-constrained industrial settings.

Abstract

Data streams are rarely static in dynamic environments like Industry 4.0. Instead, they constantly change, making traditional offline models outdated unless they can quickly adjust to the new data. This need can be adequately addressed by continual learning (CL), which allows systems to gradually acquire knowledge without incurring the prohibitive costs of retraining them from scratch. Most research on continual learning focuses on classification problems, while very few studies address regression tasks. We propose the first prototype-based generative replay framework designed for online task-free continual regression. Our approach defines an adaptive output-space discretization model, enabling prototype-based generative replay for continual regression without storing raw data. Evidence obtained from several benchmark datasets shows that our framework reduces forgetting and provides more stable performance than other state-of-the-art solutions.

Continual Learning for non-stationary regression via Memory-Efficient Replay

TL;DR

The paper tackles non-stationary regression in streaming data and the problem of catastrophic forgetting. It introduces a memory-efficient continual regression framework that extends TRIL3 with a Decision Tree Regressor to create virtual labels and a Mixture Density Network for online regression, powered by XuILVQ prototypes to generate synthetic data without storing raw samples. Across multiple tabular datasets, the method achieves competitive accuracy against offline baselines and substantially reduces forgetting compared with replay and CLeaR approaches. Memory analysis shows the prototype bank remains compact (memory-to-data ratios well below 2%), making it suitable for resource-constrained industrial settings.

Abstract

Data streams are rarely static in dynamic environments like Industry 4.0. Instead, they constantly change, making traditional offline models outdated unless they can quickly adjust to the new data. This need can be adequately addressed by continual learning (CL), which allows systems to gradually acquire knowledge without incurring the prohibitive costs of retraining them from scratch. Most research on continual learning focuses on classification problems, while very few studies address regression tasks. We propose the first prototype-based generative replay framework designed for online task-free continual regression. Our approach defines an adaptive output-space discretization model, enabling prototype-based generative replay for continual regression without storing raw data. Evidence obtained from several benchmark datasets shows that our framework reduces forgetting and provides more stable performance than other state-of-the-art solutions.
Paper Structure (17 sections, 6 equations, 4 figures, 11 tables)

This paper contains 17 sections, 6 equations, 4 figures, 11 tables.

Figures (4)

  • Figure 1: Diagram of the TRIL3 workflow.
  • Figure 2: Diagram of the proposed continual regression framework.
  • Figure 3: Evolution of R² in the diamonds dataset for different synthetic data ratios.
  • Figure 4: Evolution of the number of prototypes in the superconductors dataset for different synthetic data ratios.