Table of Contents
Fetching ...

Tensor Network Estimation of Distribution Algorithms

John Gardiner, Javier Lopez-Piqueres

TL;DR

This work reframes tensor-network generative models as Estimation of Distribution Algorithms (EDAs) and analyzes how their optimization performance relates to the quality of the underlying model. By comparing GEO and PROTES and by introducing explicit mutation as an exploration mechanism, the authors show that stronger generative models do not automatically yield better optimization, and that adding mutations can improve performance even when the model becomes a worse estimator of the training data distribution. Across experiments with equal-weighted portfolio optimization and benchmark problems, a low-expressivity MPS (bond dimension ~2) with Boltzmann-based selection and mutation often matches or surpasses more expressive TNs or Bayesian-network EDAs. The results advocate separating exploration from exploitation in TN-EDA design and suggest that future work should explore principled mutation strategies and generalization metrics to unlock the practical potential of tensor-network-based optimization.

Abstract

Tensor networks are a tool first employed in the context of many-body quantum physics that now have a wide range of uses across the computational sciences, from numerical methods to machine learning. Methods integrating tensor networks into evolutionary optimization algorithms have appeared in the recent literature. In essence, these methods can be understood as replacing the traditional crossover operation of a genetic algorithm with a tensor network-based generative model. We investigate these methods from the point of view that they are Estimation of Distribution Algorithms (EDAs). We find that optimization performance of these methods is not related to the power of the generative model in a straightforward way. Generative models that are better (in the sense that they better model the distribution from which their training data is drawn) do not necessarily result in better performance of the optimization algorithm they form a part of. This raises the question of how best to incorporate powerful generative models into optimization routines. In light of this we find that adding an explicit mutation operator to the output of the generative model often improves optimization performance.

Tensor Network Estimation of Distribution Algorithms

TL;DR

This work reframes tensor-network generative models as Estimation of Distribution Algorithms (EDAs) and analyzes how their optimization performance relates to the quality of the underlying model. By comparing GEO and PROTES and by introducing explicit mutation as an exploration mechanism, the authors show that stronger generative models do not automatically yield better optimization, and that adding mutations can improve performance even when the model becomes a worse estimator of the training data distribution. Across experiments with equal-weighted portfolio optimization and benchmark problems, a low-expressivity MPS (bond dimension ~2) with Boltzmann-based selection and mutation often matches or surpasses more expressive TNs or Bayesian-network EDAs. The results advocate separating exploration from exploitation in TN-EDA design and suggest that future work should explore principled mutation strategies and generalization metrics to unlock the practical potential of tensor-network-based optimization.

Abstract

Tensor networks are a tool first employed in the context of many-body quantum physics that now have a wide range of uses across the computational sciences, from numerical methods to machine learning. Methods integrating tensor networks into evolutionary optimization algorithms have appeared in the recent literature. In essence, these methods can be understood as replacing the traditional crossover operation of a genetic algorithm with a tensor network-based generative model. We investigate these methods from the point of view that they are Estimation of Distribution Algorithms (EDAs). We find that optimization performance of these methods is not related to the power of the generative model in a straightforward way. Generative models that are better (in the sense that they better model the distribution from which their training data is drawn) do not necessarily result in better performance of the optimization algorithm they form a part of. This raises the question of how best to incorporate powerful generative models into optimization routines. In light of this we find that adding an explicit mutation operator to the output of the generative model often improves optimization performance.
Paper Structure (25 sections, 6 equations, 10 figures, 1 algorithm)

This paper contains 25 sections, 6 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Tensor Network EDA. First a selection procedure screens top bit string candidates from a pool (the population). Next, a Tensor Network Generative Model (shown an example of a Born Machine) trains on those candidates (the parents), and outputs new ones resembling the original (the children). Finally, a mutation operator flips at random output samples. New and unique samples are then added back to the pool. This series of steps is repeated throughout many iterations.
  • Figure 2: Born Machine and Bayesian Network. (Left): Born Machines are inspired by the probabilistic interpretation of quantum mechanics. Shown an example of a Matrix Product State Born Machine where probabilities are expressed as the complex conjugate square of amplitudes, which in turn are described by an MPS. (Right): Bayesian Networks are directed acyclic graphs where nodes represent random variables and edges represent conditional dependencies. Shown an example of a Bayesian Network with chain topology, i.e. a Markov Chain.
  • Figure 3: GEO pipeline original_geo. A Tensor Network (MPS in the original work) Born Machine generative model is trained to produce high quality samples by constructing a population of samples from the model. At each iteration a Boltzmann selection procedure selects training samples based on quality from all samples from all previous iterations. The tensor network can be swapped out for a different generative model.
  • Figure 4: PROTES pipeline batsheva2024protes. At each iteration, a positive Tensor Network (an MPS in the original work) is trained to produce high quality samples by constructing a pool of samples from the model from the previous iteration. Only the top candidates are selected from the pool as training data for the next iteration.
  • Figure 5: GEO with bit-flip noise(Top Left): GEO performance (on the equal-weighted portfolio optimization problem) with various values of $p_\text{flip}$. Lines are median values out of 40 independent runs. Shaded regions are 1st and 3rd quartiles. (Top Right and Bottom): At each iteration a noiseless ($p_\text{flip}=0$) generative model is trained in parallel to the model used for GEO. Plotted are the KL-divergence for the noisy model minus the KL-divergence for the noiseless model. These KL-divergences are relative to the Boltzmann distribution from which the training data that iteration is drawn. Lines are medians and shaded regions are between 1st and 3rd quartiles out of 40 independent runs. We see that KL-divergence is nearly always higher for the noisy model and that the difference in KL-divergence between noisy and noiseless models increases with increasing noise (increasing $p_\text{flip}$).
  • ...and 5 more figures