Table of Contents
Fetching ...

Using Low-Discrepancy Points for Data Compression in Machine Learning: An Experimental Comparison

Simone Göttlich, Jacob Heieck, Andreas Neuenkirch

TL;DR

This work investigates data reduction for regression and neural-network training using low-discrepancy points (Quasi-Monte Carlo). It compares two QMC-based compression schemes (QMC-averaging and QMC-Voronoi) to the adaptive supercompress method, highlighting deterministic error bounds for the QMC approaches and empirical performance across synthetic test functions and MNIST. The results show that adaptive clustering via the standard supercompress approach consistently outperforms the QMC methods on real-world, high-dimensional data, while QMC-Voronoi offers competitive performance on simple, regular problems but fails to scale to MNIST. The findings suggest that for complex data, output-space–focused clustering with adaptive refinement provides the most reliable compression for maintaining predictive accuracy while reducing training cost, whereas QMC-based guarantees are most beneficial in regular settings.

Abstract

Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl [4], which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of [14], which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.

Using Low-Discrepancy Points for Data Compression in Machine Learning: An Experimental Comparison

TL;DR

This work investigates data reduction for regression and neural-network training using low-discrepancy points (Quasi-Monte Carlo). It compares two QMC-based compression schemes (QMC-averaging and QMC-Voronoi) to the adaptive supercompress method, highlighting deterministic error bounds for the QMC approaches and empirical performance across synthetic test functions and MNIST. The results show that adaptive clustering via the standard supercompress approach consistently outperforms the QMC methods on real-world, high-dimensional data, while QMC-Voronoi offers competitive performance on simple, regular problems but fails to scale to MNIST. The findings suggest that for complex data, output-space–focused clustering with adaptive refinement provides the most reliable compression for maintaining predictive accuracy while reducing training cost, whereas QMC-based guarantees are most beneficial in regular settings.

Abstract

Low-discrepancy points (also called Quasi-Monte Carlo points) are deterministically and cleverly chosen point sets in the unit cube, which provide an approximation of the uniform distribution. We explore two methods based on such low-discrepancy points to reduce large data sets in order to train neural networks. The first one is the method of Dick and Feischl [4], which relies on digital nets and an averaging procedure. Motivated by our experimental findings, we construct a second method, which again uses digital nets, but Voronoi clustering instead of averaging. Both methods are compared to the supercompress approach of [14], which is a variant of the K-means clustering algorithm. The comparison is done in terms of the compression error for different objective functions and the accuracy of the training of a neural network.
Paper Structure (15 sections, 3 theorems, 45 equations, 6 figures, 10 tables)

This paper contains 15 sections, 3 theorems, 45 equations, 6 figures, 10 tables.

Key Result

Lemma 3.2

Let $\nu \geq 0$ be an integer. For all $\boldsymbol{a} \in \mathbb{N}_{0}^{s}$ the combination principle holds. Here, $\mathbbm{1}_{A}$ denotes the indicator function for an arbitrary set $A$.

Figures (6)

  • Figure 1: $(0, 4, 2)$-net in base $2$ (blue points)
  • Figure 2: $(0,4,2)$-net in base $2$ (blue points) with point set $\mathcal{X}$ (red crosses)
  • Figure 3: visualization of the precompressed data
  • Figure 4: confusion chart of the neural network without compression
  • Figure 5: confusion charts for different methods with a compression rate of $20 \%$
  • ...and 1 more figures

Theorems & Definitions (5)

  • Definition 3.1: e.g., p.5, 1
  • Lemma 3.2: Lemma 1, 1
  • Definition 3.3: p.12, p.14, 1
  • Theorem 3.4: Corollary 12, 1
  • Theorem 3.5: Corollary 14, 1