Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

Florian Beier; Hancheng Bi; Clément Sarrazin; Bernhard Schmitzer; Gabriele Steidl

Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

Florian Beier, Hancheng Bi, Clément Sarrazin, Bernhard Schmitzer, Gabriele Steidl

TL;DR

This work addresses learning the joint distribution of $(X,Y)$ from batches where within-batch pairings are unknown. It derives a maximum-likelihood permutation functional and an efficient approximate surrogate, and proves Γ-convergence so the true density $p$ is recoverable as the number of batches grows. A nonparametric hypothesis class built from entropic optimal transport kernels enforces probabilistic transfer-density constraints and stabilizes the learning of transfer operators from data, with a discrete problem solved by a modified EMML algorithm that preserves convergence. Numerical experiments on dynamical-systems scenarios and particle colocalization demonstrate the method’s ability to infer smoothed transfer operators and reveal coherent structures from unpaired observations. The approach provides a scalable framework for transferring structural information in systems where pairings are imperfect or unavailable, with potential applications in physics, imaging, and data-driven dynamical analysis.

Abstract

In this paper, we are concerned with estimating the joint probability of random variables $X$ and $Y$, given $N$ independent observation blocks $(\boldsymbol{x}^i,\boldsymbol{y}^i)$, $i=1,\ldots,N$, each of $M$ samples $(\boldsymbol{x}^i,\boldsymbol{y}^i) = \bigl((x^i_j, y^i_{σ^i(j)}) \bigr)_{j=1}^M$, where $σ^i$ denotes an unknown permutation of i.i.d. sampled pairs $(x^i_j,y_j^i)$, $j=1,\ldots,M$. This means that the internal ordering of the $M$ samples within an observation block is not known. We derive a maximum-likelihood inference functional, propose a computationally tractable approximation and analyze their properties. In particular, we prove a $Γ$-convergence result showing that we can recover the true density from empirical approximations as the number $N$ of blocks goes to infinity. Using entropic optimal transport kernels, we model a class of hypothesis spaces of density functions over which the inference functional can be minimized. This hypothesis class is particularly suited for approximate inference of transfer operators from data. We solve the resulting discrete minimization problem by a modification of the EMML algorithm to take addional transition probability constraints into account and prove the convergence of this algorithm. Proof-of-concept examples demonstrate the potential of our method.

Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

TL;DR

This work addresses learning the joint distribution of

from batches where within-batch pairings are unknown. It derives a maximum-likelihood permutation functional and an efficient approximate surrogate, and proves Γ-convergence so the true density

is recoverable as the number of batches grows. A nonparametric hypothesis class built from entropic optimal transport kernels enforces probabilistic transfer-density constraints and stabilizes the learning of transfer operators from data, with a discrete problem solved by a modified EMML algorithm that preserves convergence. Numerical experiments on dynamical-systems scenarios and particle colocalization demonstrate the method’s ability to infer smoothed transfer operators and reveal coherent structures from unpaired observations. The approach provides a scalable framework for transferring structural information in systems where pairings are imperfect or unavailable, with potential applications in physics, imaging, and data-driven dynamical analysis.

Abstract

In this paper, we are concerned with estimating the joint probability of random variables

and

, given

independent observation blocks

, each of

samples

, where

denotes an unknown permutation of i.i.d. sampled pairs

. This means that the internal ordering of the

samples within an observation block is not known. We derive a maximum-likelihood inference functional, propose a computationally tractable approximation and analyze their properties. In particular, we prove a

-convergence result showing that we can recover the true density from empirical approximations as the number

of blocks goes to infinity. Using entropic optimal transport kernels, we model a class of hypothesis spaces of density functions over which the inference functional can be minimized. This hypothesis class is particularly suited for approximate inference of transfer operators from data. We solve the resulting discrete minimization problem by a modification of the EMML algorithm to take addional transition probability constraints into account and prove the convergence of this algorithm. Proof-of-concept examples demonstrate the potential of our method.

Paper Structure (28 sections, 24 theorems, 159 equations, 11 figures, 2 algorithms)

This paper contains 28 sections, 24 theorems, 159 equations, 11 figures, 2 algorithms.

Introduction
Analysis of dynamical systems.
Particle colocalization.
Outline of the paper.
Notation
Inference functionals
Modeling and permutation functional
Approximate inference functional
Basic properties of inference functionals
Non-parametric estimation with entropic transport kernels
Entropic optimal transport
Non-parametric class of hypothesis densities
Explicit discrete functional
Matrix-vector representation.
Choice of X and Y.
...and 13 more sections

Key Result

Proposition 2

The law of the random variable $\boldsymbol{Z} \in (\mathbb X \times \mathbb Y)^M$ in Z is given by

Figures (11)

Figure 1: Illustration of the transport density $q = k^\varepsilon_{\mu^N\tilde{\mu}}.\xi.k^\varepsilon_{\nu^N\tilde{\nu}}$.
Figure 2: $N=10^4$ samples from $(X,Y)$ according to \ref{['eq:PiTorus']} for three different values of $\sigma$.
Figure 3: Left: optimal $\hat{q}$ for $\varepsilon=0.0025$ and varying $N$. Right: $\|\hat{q}-p\|_{L^2(\mu \otimes \nu)}$ for $\varepsilon \in \{0.001,0.0025,0.01\}$ and varying $N$. Plot of the mean and standard deviation obtained from 100 simulations. We expect that the curve for $\varepsilon=0.001$ will eventually go below the curve for $\varepsilon=0.0025$ as $N$ increases further. The dashed line shows $\|p-1\|_{L^2(\mu \otimes \nu)}$. In all cases $M=20$, $\sigma=0.05$.
Figure 4: Left: optimal $\hat{q}$ for $\sigma=0.05$ and varying $\varepsilon$. Right: $\|\hat{q}-p\|_{L^2(\mu \otimes \nu)}$ for $\sigma \in \{0.01,0.025,0.05\}$ and varying $\varepsilon$. Shown are mean and standard deviation obtained from 100 simulations. The dashed line indicates $\|p-1\|_{L^2(\mu \otimes \nu)}$ for reference. In all cases $M=N=20$.
Figure 5: Left: optimal $\hat{q}$ for $N=70$, $\sigma=0.025$, $\varepsilon=0.0025$ and varying $M$. Right: $\|\hat{q}-p\|_{L^2(\mu \otimes \nu)}$ for $N=20$, $\sigma \in \{0.01,0.025,0.05\}$, $\varepsilon =0.01$ and varying $M$. Plot of the mean and standard deviation obtained from 100 simulations. The dashed line indicates $\|p-1\|_{L^2(\mu \otimes \nu)}$. The dotted line shows the error $\|p-\tilde{q}\|_{L^2(\mu \otimes \nu)}$, where $\tilde{q}$ is the minimizer of $J_{M=1}$ over $Q$, i.e. the best possible approximation of $p$ by $Q$ in the sense of $\mathop{\mathrm{KL}}\nolimits$ and without the added difficulty of unobserved pairings.
...and 6 more figures

Theorems & Definitions (57)

Proposition 2
proof
Definition 3: Permutation inference functional
Lemma 4
proof
Proposition 5
proof
Lemma 6
proof
Definition 7: Approximate inference functional
...and 47 more

Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

TL;DR

Abstract

Transfer Operators from Batches of Unpaired Points via Entropic Transport Kernels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (57)