Table of Contents
Fetching ...

Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification

Hector Zenil, Narsis A. Kiani, Alyssa Adams, Felipe S. Abrahão, Antonio Rueda-Toicen, Allan A. Zea, Luan Ozelim, Jesper Tegnér

TL;DR

The paper addresses the problem of reducing high-dimensional data and complex networks without discarding essential structural information. It introduces Minimal Information Loss Selection (MILS), an unsupervised method grounded in algorithmic information theory, utilizing the information-difference $I(G,F)=C(G)-C(G\setminus F)$ and block-based complexity via $\text{BDM}(x)=\sum_i \text{CTM}(x_i)+\log n$ together with $K(s) \approx -\log P(s)$ to guide edge deletions. Theoretical results establish bounds on information loss that scale with the number of perturbations and network properties, and MILS achieves polynomial-time performance while approaching exponential-optimal reductions. Empirically, MILS often outperforms traditional statistical dimensionality-reduction methods and state-of-the-art graph sparsification across networks and image data, while enabling lossy compression that preserves algorithmically meaningful structure. These findings suggest MILS as a versatile, principled tool for scalable data reduction with broad applicability in ML pipelines, network analysis, and data compression, with potential for hardware acceleration.

Abstract

We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing certain dimensionality aspects, such as the number of edges in a network, while retaining essential features of interest. These features include preserving crucial network properties like degree distribution, clustering coefficient, edge betweenness, and degree and eigenvector centralities but can also go beyond edges to nodes and weights for network pruning and trimming. Our approach outperforms classical statistical Machine Learning techniques and state-of-the-art dimensionality reduction algorithms by preserving a greater number of data features that statistical algorithms would miss, particularly nonlinear patterns stemming from deterministic recursive processes that may look statistically random but are not. Moreover, previous approaches heavily rely on a priori feature selection, which requires constant supervision. Our findings demonstrate the effectiveness of the algorithms in overcoming some of these limitations while maintaining a time-efficient computational profile. Our approach not only matches, but also exceeds, the performance of established and state-of-the-art dimensionality reduction algorithms. We extend the applicability of our method to lossy compression tasks involving images and any multi-dimensional data. This highlights the versatility and broad utility of the approach in multiple domains.

Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification

TL;DR

The paper addresses the problem of reducing high-dimensional data and complex networks without discarding essential structural information. It introduces Minimal Information Loss Selection (MILS), an unsupervised method grounded in algorithmic information theory, utilizing the information-difference and block-based complexity via together with to guide edge deletions. Theoretical results establish bounds on information loss that scale with the number of perturbations and network properties, and MILS achieves polynomial-time performance while approaching exponential-optimal reductions. Empirically, MILS often outperforms traditional statistical dimensionality-reduction methods and state-of-the-art graph sparsification across networks and image data, while enabling lossy compression that preserves algorithmically meaningful structure. These findings suggest MILS as a versatile, principled tool for scalable data reduction with broad applicability in ML pipelines, network analysis, and data compression, with potential for hardware acceleration.

Abstract

We present a novel, domain-agnostic, model-independent, unsupervised, and universally applicable Machine Learning approach for dimensionality reduction based on the principles of algorithmic complexity. Specifically, but without loss of generality, we focus on addressing the challenge of reducing certain dimensionality aspects, such as the number of edges in a network, while retaining essential features of interest. These features include preserving crucial network properties like degree distribution, clustering coefficient, edge betweenness, and degree and eigenvector centralities but can also go beyond edges to nodes and weights for network pruning and trimming. Our approach outperforms classical statistical Machine Learning techniques and state-of-the-art dimensionality reduction algorithms by preserving a greater number of data features that statistical algorithms would miss, particularly nonlinear patterns stemming from deterministic recursive processes that may look statistically random but are not. Moreover, previous approaches heavily rely on a priori feature selection, which requires constant supervision. Our findings demonstrate the effectiveness of the algorithms in overcoming some of these limitations while maintaining a time-efficient computational profile. Our approach not only matches, but also exceeds, the performance of established and state-of-the-art dimensionality reduction algorithms. We extend the applicability of our method to lossy compression tasks involving images and any multi-dimensional data. This highlights the versatility and broad utility of the approach in multiple domains.

Paper Structure

This paper contains 13 sections, 3 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Three examples of edge deletions occurring in each iteration $j$ of [Algorithm \ref{['milsalg']}, Step \ref{['stepMainloopMILS']}], indicated by the downward blue arrows (except for the bottommost blue arrows), while the subsequent red squares indicate where the deletions occurred in the previous step. The bottommost blue arrows in each Fig. \ref{['figGridDiagram']}A, \ref{['figGridDiagram']}B, and \ref{['figGridDiagram']}C refer to [Algorithm \ref{['milsalg']}, Step \ref{['stepReturnMILS']}] after which the final output of the algorithm is returned, given an adjacency matrix of a graph $G$ (such as anyone of the three topmost $4 \times 4$ matrices) as input. The input adjacency matrix in Fig. \ref{['figGridDiagram']}A corresponds exactly to a substring of the halting probability as in Calude2022GlimpseOmega. Mixing both redundant and incompressible components, the second row of the input matrix in Fig. \ref{['figGridDiagram']}B is a repetition of the first row, and the remaining $8$ bits equals to those of the input matrix in Fig. \ref{['figGridDiagram']}A. Fig. \ref{['figGridDiagram']}C presents one of the most redundant cases in which all oriented edges are present, that is, the input is an adjacency matrix of a complete (directed) graph.
  • Figure 2: Tree Diagram for Algorithm \ref{['milsalg']} with the sequence of graphs with their respective subsets of edges reduced (as selected in the iterations $j$ of [Algorithm \ref{['milsalg']}, Step \ref{['stepMainloopMILS']}]). So, one has that $F_{ i_1 } = \textit{minLoss}_1 = \min\left( \textsc{InfoRank}\left( G \right) \right)$, $F_{ i_2 } = \textit{minLoss}_2 =\min\left( \textsc{InfoRank}\left( G \setminus F_{ i_1 } \right) \right)$, and so on, which continues up until the final graph $G \setminus F_{ i_h }$ is found by the algorithm. This process is represented by the blue path $\left( G , G \setminus F_{ i_1 } , G \setminus F_{ i_2 } , \dots , G \setminus F_{ i_h } \right)$ from the root $G$ to the leaf $G \setminus F_{ i_h }$ in $h$ iterations. Notice that from each parent vertex/graph, there are as many children as (non-empty) subsets of (present) edges in the parent. Fig. \ref{['figTreeDiagram']} illustrates a speed-up both globally (as the number of iterations $j \leq h$ is always upper bounded by $\mathbf{O}\left( \left| E\left( G \right) \right| - N \right)$) and locally (as from each parent, Algorithm \ref{['milsalg']} does not have to search for the best candidate across all possible subsets of edges of the parent) achieved by Algorithm \ref{['milsalg']} from the decision process that this tree diagram represents.
  • Figure 3: Performance assessment (mean accuracy per bit of compressed image) of different compression algorithms, including MILS and other traditional approaches. MILS substantially outperforms all the considered algorithms due to its highly compressed binary setting.
  • Figure 4: Compute time in seconds for all the methods considered in this particular ML pipeline. MILS is less memory intensive at the cost of increasing computing times.
  • Figure 5: MILS or neutral edge deletion (blue) outperforms random edge deletion (red) at preserving both edge degree distribution (top, showing removed edges) and edge betweenness distribution (bottom) on an Erdős-Rényi random graph of vertex size 100 and low edge density ($\sim 4\%$) after up to 60 edges were removed (degree distribution comparison) and 150 edges were removed (edge betweenness) out of a total of 200 edges (notice also the scale differences on the $x$-axis).
  • ...and 7 more figures