Table of Contents
Fetching ...

Unsupervised Assessment of Landscape Shifts Based on Persistent Entropy and Topological Preservation

Sebastian Basterrech

TL;DR

This article introduces a novel framework for monitoring changes in multi-dimensional data streams based on persistent entropy and topology-preserving projections in a continual learning scenario and analyzes the model across three scenarios using data streams generated with MNIST samples.

Abstract

In Continual Learning (CL) contexts, concept drift typically refers to the analysis of changes in data distribution. A drift in the input data can have negative consequences on a learning predictor and the system's stability. The majority of concept drift methods emphasize the analysis of statistical changes in non-stationary data over time. In this context, we consider another perspective, where the concept drift also integrates substantial changes in the topological characteristics of the data stream. In this article, we introduce a novel framework for monitoring changes in multi-dimensional data streams. We explore variations in the topological structures of the data, presenting another angle on the standard concept drift. Our developed approach is based on persistent entropy and topology-preserving projections in a continual learning scenario. The framework operates in both unsupervised and supervised environments. To show the utility of the proposed framework, we analyze the model across three scenarios using data streams generated with MNIST samples. The obtained results reveal the potential of applying topological data analysis for shift detection and encourage further research in this area.

Unsupervised Assessment of Landscape Shifts Based on Persistent Entropy and Topological Preservation

TL;DR

This article introduces a novel framework for monitoring changes in multi-dimensional data streams based on persistent entropy and topology-preserving projections in a continual learning scenario and analyzes the model across three scenarios using data streams generated with MNIST samples.

Abstract

In Continual Learning (CL) contexts, concept drift typically refers to the analysis of changes in data distribution. A drift in the input data can have negative consequences on a learning predictor and the system's stability. The majority of concept drift methods emphasize the analysis of statistical changes in non-stationary data over time. In this context, we consider another perspective, where the concept drift also integrates substantial changes in the topological characteristics of the data stream. In this article, we introduce a novel framework for monitoring changes in multi-dimensional data streams. We explore variations in the topological structures of the data, presenting another angle on the standard concept drift. Our developed approach is based on persistent entropy and topology-preserving projections in a continual learning scenario. The framework operates in both unsupervised and supervised environments. To show the utility of the proposed framework, we analyze the model across three scenarios using data streams generated with MNIST samples. The obtained results reveal the potential of applying topological data analysis for shift detection and encourage further research in this area.
Paper Structure (13 sections, 9 figures, 1 table)

This paper contains 13 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Examples of objects that can be deformed and transformed into another object, equivalent in terms of topology. However, some shapes cannot be considered equivalent because it is not possible to define a sequence of simple continuous transformations to deform an original shape into another one while maintaining the original structure. For instance, from any of the digits at the top of the figure, we cannot form any of the digits at the bottom of the figure.
  • Figure 2: Assessing the topological changes in the latent space: comparing the persistent entropy of projected points using a Dimensionality Reduction (DR) technique.
  • Figure 3: Creation of synthetic case studies. Data streams were generated with the MNIST samples interchanged among the different topological types. The graphics illustrate the transition between the sequence of images from one topological type to another type.
  • Figure 4: Example of the latent space. Off-line analysis of the latent space generated by the SOM projections, and applying change point detection over the distance matrix.
  • Figure 5: An example of digits with a structure different from the one assumed in the case studies. The first image has two holes instead of one. The second image, which can be a number 3 or 5, has a hole. The third image doesn't have any holes. The last image is not a connected component (it has an isolated pixel in the top-left).
  • ...and 4 more figures