Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

Peter Wassenaar; Pierre Guetschel; Michael Tangermann

Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

Peter Wassenaar, Pierre Guetschel, Michael Tangermann

TL;DR

A novel variant of UMAP is introduced, called approximate UMAP (aUMAP), which aims at generating rapid projections for real-time introspection and delivers projections that replicate the projection space of standard UMAP while decreasing projection speed by an order of magnitude and maintaining the same training time.

Abstract

In the BCI field, introspection and interpretation of brain signals are desired for providing feedback or to guide rapid paradigm prototyping but are challenging due to the high noise level and dimensionality of the signals. Deep neural networks are often introspected by transforming their learned feature representations into 2- or 3-dimensional subspace visualizations using projection algorithms like Uniform Manifold Approximation and Projection (UMAP). Unfortunately, these methods are computationally expensive, making the projection of data streams in real-time a non-trivial task. In this study, we introduce a novel variant of UMAP, called approximate UMAP (aUMAP). It aims at generating rapid projections for real-time introspection. To study its suitability for real-time projecting, we benchmark the methods against standard UMAP and its neural network counterpart parametric UMAP. Our results show that approximate UMAP delivers projections that replicate the projection space of standard UMAP while decreasing projection speed by an order of magnitude and maintaining the same training time.

Approximate UMAP allows for high-rate online visualization of high-dimensional data streams

TL;DR

Abstract

Paper Structure (1 equation, 3 figures, 2 tables)

This paper contains 1 equation, 3 figures, 2 tables.

Figures (3)

Figure 1: Comparison of UMAP and aUMAP using three datasets. The 2D projections of the training data and test data produced by standard UMAP are displayed in addition to the test set projections produced by aUMAP. The gray lines connect projections of standard UMAP and aUMAP that were obtained from the same test data sample. Colors indicate the classes of the data (not available to the projection methods).
Figure 2: Training times. Models were trained on mock datasets generated from a multiclass Poisson distribution. Left: Models were trained on datasets of 5000 samples with varying dimensionalities. Right: Training across varying sample counts was done using subsets of a 1000-dimensional dataset. Standard UMAP and aUMAP models were trained on the CPU. pUMAP models were trained on both, CPU and GPU separately. Note that aUMAP and standard UMAP results are near-identical, causing the line of the latter to be concealed in the graph. All results shown were averaged across 10 repetitions. Error bars indicate the standard deviation across the runs.
Figure 3: Projection times. The models used for projecting were obtained from the training time experiment. For each model and condition, 500 samples from a multiclass Poisson distribution were passed to the models to be projected. Samples were provided either in a singular batch of 500, denoted as one-go (upper figures), or in small batches of 5 samples, denoted as batch (bottom figures). Standard UMAP and aUMAP models were trained on CPU. pUMAP was trained on both CPU and GPU separately. The results were averaged across 10 repetitions.