Table of Contents
Fetching ...

Sparsifying dimensionality reduction of PDE solution data with Bregman learning

Tjeerd Jan Heeringa, Christoph Brune, Mengwu Guo

TL;DR

The paper tackles the challenge of compressing PDE solution data with nonlinear projections while controlling model size. It introduces a multistep sparsification pipeline that trains autoencoders using linearized Bregman iterations with sparsity- and low-rank-inducing regularizers, followed by latent-SVD truncation and bias-propagation post-processing. Across 1D diffusion, 1D advection, and 2D reaction-diffusion, the approach achieves accuracy comparable to standard optimizers while reducing parameters by about 30% and shrinking the latent space by roughly 60%, with AdaBreg often delivering the best sparsity-accuracy trade-off. This yields practical, efficient reduced-order models for PDE data and provides a framework for principled latent-dimension control via sparsity and post-processing.

Abstract

Classical model reduction techniques project the governing equations onto a linear subspace of the original state space. More recent data-driven techniques use neural networks to enable nonlinear projections. Whilst those often enable stronger compression, they may have redundant parameters and lead to suboptimal latent dimensionality. To overcome these, we propose a multistep algorithm that induces sparsity in the encoder-decoder networks for effective reduction in the number of parameters and additional compression of the latent space. This algorithm starts with sparsely initialized a network and training it using linearized Bregman iterations. These iterations have been very successful in computer vision and compressed sensing tasks, but have not yet been used for reduced-order modelling. After the training, we further compress the latent space dimensionality by using a form of proper orthogonal decomposition. Last, we use a bias propagation technique to change the induced sparsity into an effective reduction of parameters. We apply this algorithm to three representative PDE models: 1D diffusion, 1D advection, and 2D reaction-diffusion. Compared to conventional training methods like Adam, the proposed method achieves similar accuracy with 30% less parameters and a significantly smaller latent space.

Sparsifying dimensionality reduction of PDE solution data with Bregman learning

TL;DR

The paper tackles the challenge of compressing PDE solution data with nonlinear projections while controlling model size. It introduces a multistep sparsification pipeline that trains autoencoders using linearized Bregman iterations with sparsity- and low-rank-inducing regularizers, followed by latent-SVD truncation and bias-propagation post-processing. Across 1D diffusion, 1D advection, and 2D reaction-diffusion, the approach achieves accuracy comparable to standard optimizers while reducing parameters by about 30% and shrinking the latent space by roughly 60%, with AdaBreg often delivering the best sparsity-accuracy trade-off. This yields practical, efficient reduced-order models for PDE data and provides a framework for principled latent-dimension control via sparsity and post-processing.

Abstract

Classical model reduction techniques project the governing equations onto a linear subspace of the original state space. More recent data-driven techniques use neural networks to enable nonlinear projections. Whilst those often enable stronger compression, they may have redundant parameters and lead to suboptimal latent dimensionality. To overcome these, we propose a multistep algorithm that induces sparsity in the encoder-decoder networks for effective reduction in the number of parameters and additional compression of the latent space. This algorithm starts with sparsely initialized a network and training it using linearized Bregman iterations. These iterations have been very successful in computer vision and compressed sensing tasks, but have not yet been used for reduced-order modelling. After the training, we further compress the latent space dimensionality by using a form of proper orthogonal decomposition. Last, we use a bias propagation technique to change the induced sparsity into an effective reduction of parameters. We apply this algorithm to three representative PDE models: 1D diffusion, 1D advection, and 2D reaction-diffusion. Compared to conventional training methods like Adam, the proposed method achieves similar accuracy with 30% less parameters and a significantly smaller latent space.
Paper Structure (18 sections, 37 equations, 20 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 37 equations, 20 figures, 4 tables, 1 algorithm.

Figures (20)

  • Figure 1: Bregman iterations with TV regularization applied to a noisy cat image. The TV regularization is computed using the PyProximal [ravasi_pyproximal_2024] implementation of the algorithm from beck_fast_2009 with $\eta=0.5$, $300$ sub-iterations, and a relative tolerance of $1.0e-8$.
  • Figure 2: Example of \ref{['alg:sparse-bregman']} applied to the small autoencoder. Blue nodes represent active neurons (output still depends on input), red inactive neurons (output does not depend on input), and lines between two neurons indicate the weight between them is non-zero. In (a), we see the autoencoder after lines 1 and 2 of \ref{['alg:sparse-bregman']} have been executed. In (b), we see the autoencoder after lines 3 to 9 of \ref{['alg:sparse-bregman']} have been executed. A lot of inactive neurons have been created, but we note by construction each of these is still outputting a non-zero value. This is due to the biases $b^\ell$ being strictly positive. In (c), we see the autoencoder after training. Roughly half of the inactive neurons have been become active. In (d), we see the network after having applied the latent SVD post-processing described in lines 20 to 25 of \ref{['alg:sparse-bregman']}. The difference between (c) and (d) is that half the neurons in the bottleneck have been removed. In (e), we see the network after having applied the bias propagation post-processing described in lines 26 and 27. Visually, this looks like removing inactive neurons from all layers except from the output. Note that in this particular example, there are still inactive neurons in the output layer. This is caused by the data all having the same values for the indices corresponding to these neurons. This implies that the training algorithm is also able to detect this kind of pattern in the data.
  • Figure 3: Numerical solution on the left for the diffusion equation with $\mu_{\text{diff}}=0.1$, and on the right the singular value decay of the snapshot matrix corresponding to the shown numerical solution.
  • Figure 4: Wandb sweep for SGD. Runs are started with the learning rate on the left, and the lowest training loss achieved is shown on the right. The lowest loss is achieved for $\eta\approx2.0e-5$.
  • Figure 5: Wandb sweep for Adam. Runs are started with the learning rate on the left, and the lowest training loss achieved is shown on the right. The lowest loss is achieved for $\eta\approx1.5e-3$.
  • ...and 15 more figures