Table of Contents
Fetching ...

Topological model selection: a case-study in tumour-induced angiogenesis

Robert A McDonald, Helen M Byrne, Heather A Harrington, Thomas Thorne, Bernadette J Stolz

TL;DR

This paper tackles parameter inference and model selection for complex, likelihood-intractable spatio-temporal models of tumour-induced angiogenesis by fusing Topological Data Analysis (TDA) with Approximate Bayesian Computation (ABC) and Random Forests (RFs). It presents a three-step pipeline that (i) identifies informative topological and spatial summary statistics via RFs, (ii) performs ABC-SMC to infer parameters for each candidate model, and (iii) uses RF-based model choice to estimate posterior model probabilities. Using three discrete EC-trajectory models (AC, SL, PS) and synthetic data, the approach infers four parameters per model and accurately selects the correct model in test cases, with topological summaries (EPH, persistence images) proving highly informative. The framework is designed to extend to time-evolving vascular remodeling and other spatio-temporal systems, offering a principled way to compare diverse modeling paradigms against data.

Abstract

Comparing mathematical models offers a means to evaluate competing scientific theories. However, exact methods of model calibration are not applicable to many probabilistic models which simulate high-dimensional spatio-temporal data. Approximate Bayesian Computation is a widely-used method for parameter inference and model selection in such scenarios, and it may be combined with Topological Data Analysis to study models which simulate data with fine spatial structure. We develop a flexible pipeline for parameter inference and model selection in spatio-temporal models. Our pipeline identifies topological summary statistics which quantify spatio-temporal data and uses them to approximate parameter and model posterior distributions. We validate our pipeline on models of tumour-induced angiogenesis, inferring four parameters in three established models and identifying the correct model in synthetic test-cases.

Topological model selection: a case-study in tumour-induced angiogenesis

TL;DR

This paper tackles parameter inference and model selection for complex, likelihood-intractable spatio-temporal models of tumour-induced angiogenesis by fusing Topological Data Analysis (TDA) with Approximate Bayesian Computation (ABC) and Random Forests (RFs). It presents a three-step pipeline that (i) identifies informative topological and spatial summary statistics via RFs, (ii) performs ABC-SMC to infer parameters for each candidate model, and (iii) uses RF-based model choice to estimate posterior model probabilities. Using three discrete EC-trajectory models (AC, SL, PS) and synthetic data, the approach infers four parameters per model and accurately selects the correct model in test cases, with topological summaries (EPH, persistence images) proving highly informative. The framework is designed to extend to time-evolving vascular remodeling and other spatio-temporal systems, offering a principled way to compare diverse modeling paradigms against data.

Abstract

Comparing mathematical models offers a means to evaluate competing scientific theories. However, exact methods of model calibration are not applicable to many probabilistic models which simulate high-dimensional spatio-temporal data. Approximate Bayesian Computation is a widely-used method for parameter inference and model selection in such scenarios, and it may be combined with Topological Data Analysis to study models which simulate data with fine spatial structure. We develop a flexible pipeline for parameter inference and model selection in spatio-temporal models. Our pipeline identifies topological summary statistics which quantify spatio-temporal data and uses them to approximate parameter and model posterior distributions. We validate our pipeline on models of tumour-induced angiogenesis, inferring four parameters in three established models and identifying the correct model in synthetic test-cases.

Paper Structure

This paper contains 35 sections, 20 equations, 14 figures, 2 tables, 1 algorithm.

Figures (14)

  • Figure 1: In the AC model andersonChaplain, a tip endothelial cell (EC) makes one of five possible moves on a square lattice in each time-step according to probabilities $P_0, P_1, P_2, P_3, P_4$. A chemotaxis parameter $\chi$ biases movement probabilities in the direction of increasing VEGF concentration, and a haptotaxis parameter $\rho$ biases moves in the direction of increasing fibronectin. In the SL model stokesLauffenberger tip ECs move in any direction (off-lattice) with velocities modelled by a two-dimensional stochastic differential equation. Parameters $\kappa$ and $\sigma$ determine how strongly an EC's current velocity $w$ is affected by the VEGF gradient $c$, and random variation $r$ respectively. The PS model plank_sleeman assigns a constant speed to each tip EC and, at each time-step, rotates the angle that the velocity vector makes with the vertical. The probability $\hat{\tau}_n^+ + \hat{\tau}_n^-$ that a tip EC turns by $\hat{\phi}$ is determined by a turning rate parameter $D_r$. A chemotaxis parameter $d_c$ biases turns that re-orient the EC's direction towards the tumour. In all models, a tip EC may bifurcate into two ECs which thereafter move independently if its age exceeds the minimum age for branching parameter $a_\text{br}$ and the VEGF concentration at its location exceeds the the VEGF threshold for branching parameter $c_\text{br}$. We show how many spatially-averaged and topological summary statistics, computed in either the $x$ or $y$ co-ordinate direction, appear among the $100$ most important summary statistics to the inference of each parameter.
  • Figure 2: A persistence diagram (PD) and extended persistence diagram (EPD) for a simple blood vessel computed using the vertical sweeping-plane filtration. The PD points quantify the size and location of the small lower branch ($\blacksquare$), and the locations of the component ($\blacksquare$) and loop ($\bullet$). The EPD points quantify the location and size of all topological features quantified by the PD, in addition to the small upper branch ($\bullet$). The branches () are not detected by PH or EPH with this sweeping-plane filtration.
  • Figure 3: We infer the minimum age for branching ($a_\text{br}$) and VEGF threshold for branching ($c_\text{br}$) in each model, as well as chemotaxis and haptotaxis parameters ($\chi$ and $\rho$) in the AC model, chemotaxis and randomness parameters ($\kappa$ and $\sigma$) in the SL model, and chemotaxis and turning rate parameters ($d_c$ and $D_r$) in the PS model. We simulate each model $10$ times at known parameter values to generate two synthetic test-cases for each model, and show the final time-step of one such simulation. We then use steps 1-2 of section 3 to approximate the parameter posterior $p(\Theta | \mathcal{D}^*)$ in each test-case. We project the approximate ABC-SMC posterior to each parameter pair and plot the resulting distributions (fitting a Gaussian kernel to the parameter values accepted in the final population of the ABC-SMC algorithm), along with the true parameter which generated the test-case.
  • Figure 4: We approximate the model posterior $p(m|\mathcal{D}^*)$ using the six test-cases from Figure \ref{['fig:param_inference']}, highlighting one example of $\mathcal{D}^*$. For each test-case, we show one example of data simulated using an inferred parameter from each model's approximate parameter posterior. Each 'prediction' shows an example of that model's approximation of the true data generation process.
  • Figure 5: Schematic showing tip EC movement rules in the Anderson-Chaplain (AC), Stokes-Lauffenberger (SL) and Plank-Sleeman (PS) models. In the AC model, tip ECs move on a grid according to probabilities $\hat{P}_j = P_j / (P_0 + P_1 + P_2 + P_3 + P_4)$ for $j=0, 1, 2, 3, 4$. Higher values of the chemotaxis and haptotaxis parameters induce a bias into those probabilities which specify a movement towards increasing concentrations of VEGF and fibronectin respectively. The SL model updates the velocity $v_i^t$ of the EC at location $s^t$ as a weighted sum of the current velocity $w$, randomness $r$ and chemotaxis $c$. Randomness and chemotaxis and parameters regulate the weight of the corresponding terms when updating the velocity. The PS model rotates the movement angle $\phi$ between an EC's velocity vector and the horizontal by $\hat{\phi}$ with transition probabilities $\hat{\tau}_n^+$ and $\hat{\tau}_n^-$ and ECs move a fixed distance $\hat{s}$ in the new direction at each time-step. A turning rate parameter regulates how often the EC's angle of movement updates, and a turning bias parameter changes how likely such an update is to favor the direction of increasing VEGF concentration. All models use the same rules for branching--a tip EC at location $s^t$ bifurcates into two tip ECs that move independently when $t$ is greater than the minimum age for branching parameter $a_\text{br}$ and the concentration of VEGF at $s^t$ is greater than the VEGF threshold for branching parameter $c_\text{br}$. If a movement rule would cause an EC to move into a location already occupied by an EC, that EC instead anastomoses and is considered for no further movement. Created in https://BioRender.com
  • ...and 9 more figures