Table of Contents
Fetching ...

Self-interacting processes via Doob conditioning

Francesco Coghi, Juan P. Garrahan

TL;DR

This work shows that self-interacting processes with dynamics conditioned on trajectorywise occupation measures can be understood as Doob-optimal transforms of underlying Markov processes. By recasting conditioning as a nonlocal, multi-time constraint and employing a tensor-network formalism, the authors derive that the conditioned dynamics are themselves self-interacting Markov processes, formally realized as Doob dynamics in an extended state space. They illustrate the framework with random walk bridges, excursions, and forced excursions, deriving time-dependent forces and value-function recursions that implement the conditioning. The approach provides a unifying perspective on memory effects in stochastic processes and opens avenues for connections to reinforcement learning and open quantum dynamics, with practical tools to construct conditioned ensembles efficiently.

Abstract

We connect self-interacting processes, that is, stochastic processes where transitions depend on the time spent by a trajectory in each configuration, to Doob conditioning. In this way we demonstrate that Markov processes with constrained occupation measures are realised optimally by self-interacting dynamics. We use a tensor network framework to guide our derivations. We illustrate our general results with new perspectives on well-known examples of self-interacting processes, such as random walk bridges, excursions, and forced excursions.

Self-interacting processes via Doob conditioning

TL;DR

This work shows that self-interacting processes with dynamics conditioned on trajectorywise occupation measures can be understood as Doob-optimal transforms of underlying Markov processes. By recasting conditioning as a nonlocal, multi-time constraint and employing a tensor-network formalism, the authors derive that the conditioned dynamics are themselves self-interacting Markov processes, formally realized as Doob dynamics in an extended state space. They illustrate the framework with random walk bridges, excursions, and forced excursions, deriving time-dependent forces and value-function recursions that implement the conditioning. The approach provides a unifying perspective on memory effects in stochastic processes and opens avenues for connections to reinforcement learning and open quantum dynamics, with practical tools to construct conditioned ensembles efficiently.

Abstract

We connect self-interacting processes, that is, stochastic processes where transitions depend on the time spent by a trajectory in each configuration, to Doob conditioning. In this way we demonstrate that Markov processes with constrained occupation measures are realised optimally by self-interacting dynamics. We use a tensor network framework to guide our derivations. We illustrate our general results with new perspectives on well-known examples of self-interacting processes, such as random walk bridges, excursions, and forced excursions.

Paper Structure

This paper contains 19 sections, 45 equations, 4 figures.

Figures (4)

  • Figure 1: Tensor network representation of a hidden Markov process.(a) Rank-3 tensor corresponding to the transition operator Eq. \ref{['eq:M']}. The horizontal legs have dimension $\Xi$ (or "bond dimension") of the hidden states. The vertical leg has dimension $\Gamma$ of the signal. Time flows from left to right. (b) Normalisation Eq. \ref{['eq:sumM']}. The T-shaped operators represent the "flat" states in either $\Xi$ or $\Lambda$. The joining of legs indicate contraction over the indices. One calls this a TN in right canonical form. (c) Tensor "train" encoding the trajectories of the process up to time $T$. By assigning specific values to the legs of the tensor one obtains the probability of that specific trajectory. Tensor network representation of a classical Markov process.(d) Rank-3 delta tensor, which vanishes unless all legs take the same value. (e) For a Markov process the signal is given by the states, and the elementary tensor of panel (a) reduces to the one shown corresponding to Eq. \ref{['eq:pxy']}. (f) Normalisation condition (or right canonical form of the TN) given by Eq. \ref{['eq:norm']}. (g) TN for a trajectory of the Markov process.
  • Figure 2: Local-in-time conditioning.(a) The l.h.s. represents a conditioned dynamics. The rank-2 tensors $W$ (yellow circles) are diagonal operators that impose conditions at each time by multiplying the probability of a trajectory by a factor that depends on the observed transition at that time. For hard conditioning, cf. Sec. III.A.1, the weighing functions are projectors taking values 0 or 1, while for soft conditioning, cf. Sec. III.A.2, they introduce non-negative factors. The overall reweighting of the trajectory probability is the product of all these factors. In the r.h.s. we have inserted identities. We see that the same TN is obtained if we define the elementary tensors in terms of the operators collected inside the dotted box. This illustrates the "gauge symmetry" of the TN which we can exploit to obtain the Doob dynamics of the conditioned process. Specifically, we aim to find value functions$V_t$ such that the operators in the box are in right canonical form. This corresponds to fixing the gauge of the TN. (b) Identity used in the r.h.s. of the previous panel in terms of diagonal rank-2 operators corresponding to the value functions. Basic Doob transform.(c) TN of a Markov process with a condition on its final state. (d) Exploiting the gauge symmetry of the TN, see panel (a), we obtain the transition operators of the Doob process that optimally generates conditioned trajectories, see Eq. \ref{['eq:TransitionMatrixDoob']}. (e) Graphical representation of Eqs. (\ref{['eq:BKEDoob']}) and (\ref{['eq:FinalConditionDoob']}). Value functions that satisfy these equations make the tensors in the previous panel to be of right canonical form. Multi-time conditioning and generalised Doob transform.(f) TN representing conditioning dependent on the states visited in the past. The conditioning operators, $w_t$, are diagonal tensors that receive information on the past states in the trajectory and also pass it forward. (g) The corresponding Doob dynamics, Eq. \ref{['eq:TransitionMatrixDoobGeneral']}, is Markovian only in the extended space $\Xi \times \Gamma$.
  • Figure 3: Fair coin and conditionings.(a) The elementary tensor for the coin transitions, Eq. \ref{['eq:pCoin']}, is the product of flat state. (b) The MPS representing the dynamics of the coin becomes a product operator as no information needs to be carried along the time direction, that is, every transition is independent of the previous one, and the Markov chain is an i.i.d. process. (c) TN for the conditioning of Example A, Eq. \ref{['eq:CondBridge']}. (d) TN for the conditionings of Examples B and C, Eqs. (\ref{['eq:CondExcursion']}) and (\ref{['eq:FavourHeight']}).
  • Figure 4: Dynamical phase diagram of the Example C. We compute the order parameter Eq. \ref{['eq:area']} by sampling the conditioned trajectory ensemble with the optimal Doob dynamics: we obtain the value functions by solving Eq. \ref{['eq:BackwardC']} recursively in a numerically exact way (which limits the largest $T$ that we can simulate); from the value functions we obtain the transition probabilities Eq. \ref{['eq:C']} for all configurations and times, which we use to run the dynamics. (a) Average area (scaled by the maximum area $T^2/4$) as a function of $\beta$ for three values of $\alpha$, at $T=100$. (b) Average area as a function of $\alpha$ and $\beta$ for $T=60$. The dashed-red line shows the estimate for the crossover between the large and small area regimes. (c) Same for $T=100$.