Table of Contents
Fetching ...

Product Jacobi-Theta Boltzmann machines with score matching

Andrea Pasquale, Daniel Krefl, Stefano Carrazza, Frank Nielsen

TL;DR

The paper tackles high-dimensional density estimation by contrasting the Riemann-Theta Boltzmann machine (RTBM) with a restricted variant, the product Jacobi-Theta Boltzmann machine (pJTBM). It adopts score matching via the Fisher divergence to train these models without needing the partition function, and leverages a diagonal hidden-block Q to enable Jacobi-Theta factorization, yielding linear-scaling derivatives for the Fisher cost. Empirical results demonstrate that the pJTBM trains orders of magnitude faster than the full RTBM and can accommodate larger hidden layers while delivering comparable goodness-of-fit, albeit with some variance differences. The work highlights a scalable approach to exact-density representation using theta-function factorization and suggests promising directions for higher-dimensional density modelling with alternative optimization schemes.

Abstract

The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques. Successful applications can be obtained using models inspired by the Boltzmann machine (BM) architecture. In this manuscript, the product Jacobi-Theta Boltzmann machine (pJTBM) is introduced as a restricted version of the Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection matrix. We show that score matching, based on the Fisher divergence, can be used to fit probability densities with the pJTBM more efficiently than with the original RTBM.

Product Jacobi-Theta Boltzmann machines with score matching

TL;DR

The paper tackles high-dimensional density estimation by contrasting the Riemann-Theta Boltzmann machine (RTBM) with a restricted variant, the product Jacobi-Theta Boltzmann machine (pJTBM). It adopts score matching via the Fisher divergence to train these models without needing the partition function, and leverages a diagonal hidden-block Q to enable Jacobi-Theta factorization, yielding linear-scaling derivatives for the Fisher cost. Empirical results demonstrate that the pJTBM trains orders of magnitude faster than the full RTBM and can accommodate larger hidden layers while delivering comparable goodness-of-fit, albeit with some variance differences. The work highlights a scalable approach to exact-density representation using theta-function factorization and suggests promising directions for higher-dimensional density modelling with alternative optimization schemes.

Abstract

The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques. Successful applications can be obtained using models inspired by the Boltzmann machine (BM) architecture. In this manuscript, the product Jacobi-Theta Boltzmann machine (pJTBM) is introduced as a restricted version of the Riemann-Theta Boltzmann machine (RTBM) with diagonal hidden sector connection matrix. We show that score matching, based on the Fisher divergence, can be used to fit probability densities with the pJTBM more efficiently than with the original RTBM.
Paper Structure (8 sections, 2 theorems, 16 equations, 3 figures, 1 table)

This paper contains 8 sections, 2 theorems, 16 equations, 3 figures, 1 table.

Key Result

Proposition 1

Suppose that $\Im(\tau) \ge 0.742$ and $0 \le \Im(z) \le \Im(\tau) /2$. Then, for $B \ge 1$, $|\frac{d}{dz} \theta(z,\tau) - U_B(z,\tau) | \le 3|q|^{(B-1)^2}$, where Proof. In order to demonstrate the proposition we bound the remainder of the series: Numerically it can be shown that for $\Im(\tau)\ge 0.742$, we have $\frac{8\pi |q|}{(1-|q|)^2} \le 3$, which proves the proposition.

Figures (3)

  • Figure 1: Average time to evaluate the RT function openRT for different dimensions $d$ of a diagonal matrix $\Omega$ using the factorized form (yellow curve) and the standard form (light-blue curve). Note that both axes are plotted in logarithmic scale. The matrix elements of $\Omega$ have been sampled uniformly from the imaginary unit interval, and we averaged over 10 independent runs. Exponential growth is marked with a gray dashed-dotted line. A linear regression in log-log space implies that the average time for the factorized RT grows as $\propto d^{1.2}$ with $R^2 = 0.93$.
  • Figure 2: Contour plots for the pJTBM (left, red) and for the RTBM (right, blue) for the uranium dataset. The $N_h=4$ model with the lowest FF value over 10 independent runs is plotted.
  • Figure 3: Relative speed-up factor ($t_1 / t_2$) between the execution time of deconinck2002computing ($t_1$), and the naive algorithm from theta extended to the calculation of the derivatives ($t_2$), as described in the Appendix. The parameters of the RT function have been sampled uniformly from the imaginary unit interval and we averaged over 10 independent runs.

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2