Preserving Information: How does Topological Data Analysis improve Neural Network performance?

A. Stolarek; W. Jaworek

Preserving Information: How does Topological Data Analysis improve Neural Network performance?

A. Stolarek, W. Jaworek

TL;DR

The paper addresses the bottleneck that standard CNNs overlook global topological structure in data, especially under limited training data. It proposes Vector Stitching, which integrates topological features via persistence images (PI) with raw inputs, using a TDA pipeline implemented through giotto-tda to produce a rich, multi-faceted input for CNNs. Empirical results on MNIST with added noise show that Vector Stitching outperforms both raw-input and PI-only models, achieving higher accuracy and faster convergence, particularly in data-scarce and noisy scenarios. The work contributes an information-theoretic interpretation of topological augmentation, demonstrates practical benefits for image classification under challenging conditions, and suggests directions for extending TDA integration to other domains and architectures.

Abstract

Artificial Neural Networks (ANNs) require significant amounts of data and computational resources to achieve high effectiveness in performing the tasks for which they are trained. To reduce resource demands, various techniques, such as Neuron Pruning, are applied. Due to the complex structure of ANNs, interpreting the behavior of hidden layers and the features they recognize in the data is challenging. A lack of comprehensive understanding of which information is utilized during inference can lead to inefficient use of available data, thereby lowering the overall performance of the models. In this paper, we introduce a method for integrating Topological Data Analysis (TDA) with Convolutional Neural Networks (CNN) in the context of image recognition. This method significantly enhances the performance of neural networks by leveraging a broader range of information present in the data, enabling the model to make more informed and accurate predictions. Our approach, further referred to as Vector Stitching, involves combining raw image data with additional topological information derived through TDA methods. This approach enables the neural network to train on an enriched dataset, incorporating topological features that might otherwise remain unexploited or not captured by the network's inherent mechanisms. The results of our experiments highlight the potential of incorporating results of additional data analysis into the network's inference process, resulting in enhanced performance in pattern recognition tasks in digital images, particularly when using limited datasets. This work contributes to the development of methods for integrating TDA with deep learning and explores how concepts from Information Theory can explain the performance of such hybrid methods in practical implementation environments.

Preserving Information: How does Topological Data Analysis improve Neural Network performance?

TL;DR

Abstract

Preserving Information: How does Topological Data Analysis improve Neural Network performance?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)