Table of Contents
Fetching ...

A distance function for stochastic matrices

Antony R. Lee, Peter Tino, Iain Bruce Styles

TL;DR

This work develops a principled information-geometry-based framework for comparing Markov chains by introducing a Bhattacharyya-angle distance on Markov-sequence traces and a corresponding stochastic-matrix distance. It shows that, for ergodic chains, the sequence-space and matrix-space distances converge to consistent measures governed by stationary distributions, and it provides closed-form results for diagonalisable and primitive matrices. The approach yields tractable, true-metric distances that are computable from the stochastic matrices themselves, enabling robust comparisons across healthcare process models and other applications. The findings open avenues for principled model comparison in settings where initial conditions should be decoupled from the chain structure, with potential integrations into clustering, monitoring, and ML loss functions for categorical models.

Abstract

Motivated by information geometry, a distance function on the space of stochastic matrices is advocated. Starting with sequences of Markov chains the Bhattacharyya angle is advocated as the natural tool for comparing both short and long term Markov chain runs. Bounds on the convergence of the distance and mixing times are derived. Guided by the desire to compare different Markov chain models, especially in the setting of healthcare processes, a new distance function on the space of stochastic matrices is presented. It is a true distance measure which has a closed form and is efficient to implement for numerical evaluation. In the case of ergodic Markov chains, it is shown that considering either the Bhattacharyya angle on Markov sequences or the new stochastic matrix distance leads to the same distance between models.

A distance function for stochastic matrices

TL;DR

This work develops a principled information-geometry-based framework for comparing Markov chains by introducing a Bhattacharyya-angle distance on Markov-sequence traces and a corresponding stochastic-matrix distance. It shows that, for ergodic chains, the sequence-space and matrix-space distances converge to consistent measures governed by stationary distributions, and it provides closed-form results for diagonalisable and primitive matrices. The approach yields tractable, true-metric distances that are computable from the stochastic matrices themselves, enabling robust comparisons across healthcare process models and other applications. The findings open avenues for principled model comparison in settings where initial conditions should be decoupled from the chain structure, with potential integrations into clustering, monitoring, and ML loss functions for categorical models.

Abstract

Motivated by information geometry, a distance function on the space of stochastic matrices is advocated. Starting with sequences of Markov chains the Bhattacharyya angle is advocated as the natural tool for comparing both short and long term Markov chain runs. Bounds on the convergence of the distance and mixing times are derived. Guided by the desire to compare different Markov chain models, especially in the setting of healthcare processes, a new distance function on the space of stochastic matrices is presented. It is a true distance measure which has a closed form and is efficient to implement for numerical evaluation. In the case of ergodic Markov chains, it is shown that considering either the Bhattacharyya angle on Markov sequences or the new stochastic matrix distance leads to the same distance between models.

Paper Structure

This paper contains 6 sections, 3 theorems, 34 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

The Bhattacharyya rate between two Markov chain models which satisfy type 1 or type 2 requirements is

Figures (2)

  • Figure 1: Ternary plot of the four base Dirichlet distributions used to generate clusters of stochastic matrices. The top-left density plot serves as a reference distribution. The parameters $\boldsymbol{\alpha}_{i}(t)$ approach $\boldsymbol{\alpha}_{0}$ linearly such that all distributions eventually coincide, and thus cannot be distinguished.
  • Figure 2: Comparison of simulation for the four selected distance functions. The vertical axis indicates the average (adjusted) Rand score attained for the clustering according to a given distance function. The horizontal axis indicates the step number (i.e. $t\in\lbrace 0,1/T,\ldots,1\rbrace$) starting from the initial distributions to the final distributions coinciding.

Theorems & Definitions (6)

  • Theorem 1: Bhattacharyya rate
  • proof
  • Theorem 2: Stochastic Matrix Distance
  • Theorem 3: Ergodic distance
  • proof
  • proof