Table of Contents
Fetching ...

Sequential learning on a Tensor Network Born machine with Trainable Token Embedding

Wanda Hou, Miao Li, Yi-Zhuang You

TL;DR

The paper addresses modeling discrete sequential data with quantum-inspired Born machines built on matrix product states. It introduces trainable POVM embeddings via QR-based factorization to replace fixed one-hot token indices, expanding the effective operator space and enabling higher physical dimensions. Using an isometric MPS backbone, the model yields tractable log-likelihood and allows autoregressive sampling in arbitrary orders, with the probability expressed as $p(x) = ⟨Ψ_θ| ⊗_i M_γ(x_i) |Ψ_θ⟩$. Empirical results on RNA sequences show that larger physical dimensions improve NLL, single-site probabilities, and local correlations, with the POVM-based model outperforming one-hot baselines and achieving competitive performance against GPT-2 on marginal statistics. The work highlights potential quantum hardware pathways, including mapping the MPS to quantum circuits for QCBM-style sampling, and outlines future directions to handle variable-length data and continuous extensions.

Abstract

Generative models aim to learn the probability distributions underlying data, enabling the generation of new, realistic samples. Quantum inspired generative models, such as Born machines based on the matrix product state framework, have demonstrated remarkable capabilities in unsupervised learning tasks. This study advances the Born machine paradigm by introducing trainable token embeddings through positive operator valued measurements, replacing the traditional approach of static tensor indices. Key technical innovations include encoding tokens as quantum measurement operators with trainable parameters and leveraging QR decomposition to adjust the physical dimensions of the MPS. This approach maximizes the utilization of operator space and enhances the model's expressiveness. Empirical results on RNA data demonstrate that the proposed method significantly reduces negative log likelihood compared to one hot embeddings, with higher physical dimensions further enhancing single site probabilities and multi site correlations. The model also outperforms GPT2 in single site estimation and achieves competitive correlation modeling, showcasing the potential of trainable POVM embeddings for complex data correlations in quantum inspired sequence modeling.

Sequential learning on a Tensor Network Born machine with Trainable Token Embedding

TL;DR

The paper addresses modeling discrete sequential data with quantum-inspired Born machines built on matrix product states. It introduces trainable POVM embeddings via QR-based factorization to replace fixed one-hot token indices, expanding the effective operator space and enabling higher physical dimensions. Using an isometric MPS backbone, the model yields tractable log-likelihood and allows autoregressive sampling in arbitrary orders, with the probability expressed as . Empirical results on RNA sequences show that larger physical dimensions improve NLL, single-site probabilities, and local correlations, with the POVM-based model outperforming one-hot baselines and achieving competitive performance against GPT-2 on marginal statistics. The work highlights potential quantum hardware pathways, including mapping the MPS to quantum circuits for QCBM-style sampling, and outlines future directions to handle variable-length data and continuous extensions.

Abstract

Generative models aim to learn the probability distributions underlying data, enabling the generation of new, realistic samples. Quantum inspired generative models, such as Born machines based on the matrix product state framework, have demonstrated remarkable capabilities in unsupervised learning tasks. This study advances the Born machine paradigm by introducing trainable token embeddings through positive operator valued measurements, replacing the traditional approach of static tensor indices. Key technical innovations include encoding tokens as quantum measurement operators with trainable parameters and leveraging QR decomposition to adjust the physical dimensions of the MPS. This approach maximizes the utilization of operator space and enhances the model's expressiveness. Empirical results on RNA data demonstrate that the proposed method significantly reduces negative log likelihood compared to one hot embeddings, with higher physical dimensions further enhancing single site probabilities and multi site correlations. The model also outperforms GPT2 in single site estimation and achieves competitive correlation modeling, showcasing the potential of trainable POVM embeddings for complex data correlations in quantum inspired sequence modeling.
Paper Structure (5 sections, 8 equations, 4 figures, 2 algorithms)

This paper contains 5 sections, 8 equations, 4 figures, 2 algorithms.

Figures (4)

  • Figure 1: Tensor network representation of probability model $p_{\theta,\gamma}({\bm{x}})$. The MPS tensors $A_{\theta i}$ are blue circles, and the measurement operators $M_\gamma(x_i)$ are orange boxes. The isometric structure is indicated by arrows on the tensor legs, projecting from the larger Hilbert space to the smaller sub-space.
  • Figure 2: In this figure, each horizontal line represents an RNA sequence, displayed vertically. The four colors represent four types of nucleotides. (a) Generated samples by Born machine with POVM embedding. (b) Generated samples by GPT-2. (c) Real data from the test set.
  • Figure 3: The NLL objective function is minimized using Adam optimizer with adjustable learning rate for bond/physical dimension separately. Result shows that the converged NLL is decreased by jointly increasing the bond dimension and the physical dimension. The green dashed line represents the converged NLL of GPT-2 model with similar parameter size.
  • Figure 4: Compare (a-e) the single-site probability $p(x_i)$ (upper row) and (f-j) the two-site correlation $c(x_i,y_j)$ (lower row) of the model with those from the dataset. GPT-2: (a)&(f). One-hot embedding (baseline): (b)&(g). Trainable POVM embedding (ours) with the physical dimension: (c)&(h) $p=4$, (d)&(i) $p=8$, (e)&(j) $p=16$. The dashed line represents the $y=x$ reference line.