Table of Contents
Fetching ...

Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

Florian Beier, Hancheng Bi, Clément Sarrazin, Bernhard Schmitzer, Gabriele Steidl

TL;DR

This work addresses learning the joint distribution of $(X,Y)$ from batches where within-batch pairings are unknown. It derives a maximum-likelihood permutation functional and an efficient approximate surrogate, and proves Γ-convergence so the true density $p$ is recoverable as the number of batches grows. A nonparametric hypothesis class built from entropic optimal transport kernels enforces probabilistic transfer-density constraints and stabilizes the learning of transfer operators from data, with a discrete problem solved by a modified EMML algorithm that preserves convergence. Numerical experiments on dynamical-systems scenarios and particle colocalization demonstrate the method’s ability to infer smoothed transfer operators and reveal coherent structures from unpaired observations. The approach provides a scalable framework for transferring structural information in systems where pairings are imperfect or unavailable, with potential applications in physics, imaging, and data-driven dynamical analysis.

Abstract

In this paper, we are concerned with estimating the joint probability of random variables $X$ and $Y$, given $N$ independent observation blocks $(\boldsymbol{x}^i,\boldsymbol{y}^i)$, $i=1,\ldots,N$, each of $M$ samples $(\boldsymbol{x}^i,\boldsymbol{y}^i) = \bigl((x^i_j, y^i_{σ^i(j)}) \bigr)_{j=1}^M$, where $σ^i$ denotes an unknown permutation of i.i.d. sampled pairs $(x^i_j,y_j^i)$, $j=1,\ldots,M$. This means that the internal ordering of the $M$ samples within an observation block is not known. We derive a maximum-likelihood inference functional, propose a computationally tractable approximation and analyze their properties. In particular, we prove a $Γ$-convergence result showing that we can recover the true density from empirical approximations as the number $N$ of blocks goes to infinity. Using entropic optimal transport kernels, we model a class of hypothesis spaces of density functions over which the inference functional can be minimized. This hypothesis class is particularly suited for approximate inference of transfer operators from data. We solve the resulting discrete minimization problem by a modification of the EMML algorithm to take addional transition probability constraints into account and prove the convergence of this algorithm. Proof-of-concept examples demonstrate the potential of our method.

Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

TL;DR

This work addresses learning the joint distribution of from batches where within-batch pairings are unknown. It derives a maximum-likelihood permutation functional and an efficient approximate surrogate, and proves Γ-convergence so the true density is recoverable as the number of batches grows. A nonparametric hypothesis class built from entropic optimal transport kernels enforces probabilistic transfer-density constraints and stabilizes the learning of transfer operators from data, with a discrete problem solved by a modified EMML algorithm that preserves convergence. Numerical experiments on dynamical-systems scenarios and particle colocalization demonstrate the method’s ability to infer smoothed transfer operators and reveal coherent structures from unpaired observations. The approach provides a scalable framework for transferring structural information in systems where pairings are imperfect or unavailable, with potential applications in physics, imaging, and data-driven dynamical analysis.

Abstract

In this paper, we are concerned with estimating the joint probability of random variables and , given independent observation blocks , , each of samples , where denotes an unknown permutation of i.i.d. sampled pairs , . This means that the internal ordering of the samples within an observation block is not known. We derive a maximum-likelihood inference functional, propose a computationally tractable approximation and analyze their properties. In particular, we prove a -convergence result showing that we can recover the true density from empirical approximations as the number of blocks goes to infinity. Using entropic optimal transport kernels, we model a class of hypothesis spaces of density functions over which the inference functional can be minimized. This hypothesis class is particularly suited for approximate inference of transfer operators from data. We solve the resulting discrete minimization problem by a modification of the EMML algorithm to take addional transition probability constraints into account and prove the convergence of this algorithm. Proof-of-concept examples demonstrate the potential of our method.
Paper Structure (28 sections, 24 theorems, 159 equations, 11 figures, 2 algorithms)

This paper contains 28 sections, 24 theorems, 159 equations, 11 figures, 2 algorithms.

Key Result

Proposition 2

The law of the random variable $\boldsymbol{Z} \in (\mathbb X \times \mathbb Y)^M$ in Z is given by

Figures (11)

  • Figure 1: Illustration of the transport density $q = k^\varepsilon_{\mu^N\tilde{\mu}}.\xi.k^\varepsilon_{\nu^N\tilde{\nu}}$.
  • Figure 2: $N=10^4$ samples from $(X,Y)$ according to \ref{['eq:PiTorus']} for three different values of $\sigma$.
  • Figure 3: Left: optimal $\hat{q}$ for $\varepsilon=0.0025$ and varying $N$. Right: $\|\hat{q}-p\|_{L^2(\mu \otimes \nu)}$ for $\varepsilon \in \{0.001,0.0025,0.01\}$ and varying $N$. Plot of the mean and standard deviation obtained from 100 simulations. We expect that the curve for $\varepsilon=0.001$ will eventually go below the curve for $\varepsilon=0.0025$ as $N$ increases further. The dashed line shows $\|p-1\|_{L^2(\mu \otimes \nu)}$. In all cases $M=20$, $\sigma=0.05$.
  • Figure 4: Left: optimal $\hat{q}$ for $\sigma=0.05$ and varying $\varepsilon$. Right: $\|\hat{q}-p\|_{L^2(\mu \otimes \nu)}$ for $\sigma \in \{0.01,0.025,0.05\}$ and varying $\varepsilon$. Shown are mean and standard deviation obtained from 100 simulations. The dashed line indicates $\|p-1\|_{L^2(\mu \otimes \nu)}$ for reference. In all cases $M=N=20$.
  • Figure 5: Left: optimal $\hat{q}$ for $N=70$, $\sigma=0.025$, $\varepsilon=0.0025$ and varying $M$. Right: $\|\hat{q}-p\|_{L^2(\mu \otimes \nu)}$ for $N=20$, $\sigma \in \{0.01,0.025,0.05\}$, $\varepsilon =0.01$ and varying $M$. Plot of the mean and standard deviation obtained from 100 simulations. The dashed line indicates $\|p-1\|_{L^2(\mu \otimes \nu)}$. The dotted line shows the error $\|p-\tilde{q}\|_{L^2(\mu \otimes \nu)}$, where $\tilde{q}$ is the minimizer of $J_{M=1}$ over $Q$, i.e. the best possible approximation of $p$ by $Q$ in the sense of $\mathop{\mathrm{KL}}\nolimits$ and without the added difficulty of unobserved pairings.
  • ...and 6 more figures

Theorems & Definitions (57)

  • Proposition 2
  • proof
  • Definition 3: Permutation inference functional
  • Lemma 4
  • proof
  • Proposition 5
  • proof
  • Lemma 6
  • proof
  • Definition 7: Approximate inference functional
  • ...and 47 more