Table of Contents
Fetching ...

Active inference and deep generative modeling for cognitive ultrasound

Ruud JG van Sloun

TL;DR

This article puts forth the idea that US imaging systems can be recast as information-seeking agents that engage in reciprocal interactions with their anatomical environment, and equip systems with a mechanism to actively reduce uncertainty and maximize diagnostic value across a sequence of experiments.

Abstract

Ultrasound (US) has the unique potential to offer access to medical imaging to anyone, everywhere. Devices have become ultra-portable and cost-effective, akin to the stethoscope. Nevertheless US image quality and diagnostic efficacy are still highly operator- and patient-dependent. In difficult-to-image patients, image quality is often insufficient for reliable diagnosis. In this paper, we put forth that US imaging systems can be recast as information-seeking agents that engage in reciprocal interactions with their anatomical environment. Such agents autonomously adapt their transmit-receive sequences to fully personalize imaging and actively maximize information gain in-situ. To that end, we will show that the sequence of pulse-echo experiments that a US system performs can be interpreted as a perception-action loop: the action is the data acquisition, probing tissue with acoustic waves and recording reflections at the detection array, and perception is the inference of the anatomical and or functional state, potentially including associated diagnostic quantities. We then equip systems with a mechanism to actively reduce uncertainty and maximize diagnostic value across a sequence of experiments, treating action and perception jointly using Bayesian inference given generative models of the environment and action-conditional pulse-echo observations. Since the representation capacity of the generative models dictates both the quality of inferred anatomical states and the effectiveness of inferred sequences of future imaging actions, we will be greatly leveraging the enormous advances in deep generative modelling that are currently disrupting many fields and society at large. Finally, we show some examples of cognitive, closed-loop, US systems that perform active beamsteering and adaptive scanline selection, based on deep generative models that track anatomical belief states.

Active inference and deep generative modeling for cognitive ultrasound

TL;DR

This article puts forth the idea that US imaging systems can be recast as information-seeking agents that engage in reciprocal interactions with their anatomical environment, and equip systems with a mechanism to actively reduce uncertainty and maximize diagnostic value across a sequence of experiments.

Abstract

Ultrasound (US) has the unique potential to offer access to medical imaging to anyone, everywhere. Devices have become ultra-portable and cost-effective, akin to the stethoscope. Nevertheless US image quality and diagnostic efficacy are still highly operator- and patient-dependent. In difficult-to-image patients, image quality is often insufficient for reliable diagnosis. In this paper, we put forth that US imaging systems can be recast as information-seeking agents that engage in reciprocal interactions with their anatomical environment. Such agents autonomously adapt their transmit-receive sequences to fully personalize imaging and actively maximize information gain in-situ. To that end, we will show that the sequence of pulse-echo experiments that a US system performs can be interpreted as a perception-action loop: the action is the data acquisition, probing tissue with acoustic waves and recording reflections at the detection array, and perception is the inference of the anatomical and or functional state, potentially including associated diagnostic quantities. We then equip systems with a mechanism to actively reduce uncertainty and maximize diagnostic value across a sequence of experiments, treating action and perception jointly using Bayesian inference given generative models of the environment and action-conditional pulse-echo observations. Since the representation capacity of the generative models dictates both the quality of inferred anatomical states and the effectiveness of inferred sequences of future imaging actions, we will be greatly leveraging the enormous advances in deep generative modelling that are currently disrupting many fields and society at large. Finally, we show some examples of cognitive, closed-loop, US systems that perform active beamsteering and adaptive scanline selection, based on deep generative models that track anatomical belief states.

Paper Structure

This paper contains 25 sections, 27 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: At time point $t$, an agent equipped with a generative model $p$ selects an action (1), which manifests in an excitation of the environment (2). The excitation "changes" the environment (e.g. it introduces compressional waves). This in turn results in a new sensory state $\hat{y}_t$. Confronted with the updated sensory data $\hat{y}_{0:t}$, the agent then revisits its beliefs about the environment (including future states it may take, and observations that may follow from that), and computes a new (approximate) posterior $q$ (3). The ultrasound probe contains the active and sensory states, and acts as a Markov blanket that separates the agent from its environment; they only interact via the active and sensory states. The distinction between $x_e$, the environmental states, and $x$, the internal states, is to make explicit that the agent's model is in general an approximate model of the true physical environment.
  • Figure 2: Real-world high-dimensional data such as ultrasound images lie on a low-dimensional manifold embedded in that high-dimensional space. This manifold is typically very intricate and non-smooth in the high-dimensional data space, and images that lie on it are highly structured. Deep generative learning allows modeling of such highly-structured distributions and sampling of novel datapoints that lie on these low-dimensional manifolds. This is enabled by transforming samples from a tractable distribution (such as an isotropic Gaussian) $z^i\sim\mathop{\mathrm{\mathbb{R}}}\nolimits^{N_z}$ into samples from the true data distribution $x^i\sim\mathop{\mathrm{\mathbb{R}}}\nolimits^{N^2}$. There are many ways of achieving this, e.g. via a conditional distribution $p_\theta(x|z)$ trained using variational inference, game theory (adversarial models), or just maximum likelihood for specific invertible models (normalizing flows). Alternatively, iterative sampling methods learn to estimate the gradients of the true data distribution at a plurality of noise scale manifolds that successively corrupt the data distribution into a tractable isotropic normal. These gradients then allow for reversing this process using reverse diffusion or Langevin dynamics.
  • Figure 3: Example: Active beamsteering using sequential Monte-Carlo federici2024active. (a) Doppler target tracking using cognitive ultrasound. (1) The agent selects the action (beamsteering angle $\theta_t^{\textrm{tx}}$) that has the highest expected information gain given generative predictions $q(\mathbf{x}_t,\mathbf{y}_t|a'_t,\hat{\mathbf{y}}_{0:t-1})$. (2) This action prompts a new Doppler observation $\hat{\mathbf{y}}_t$, which in turn triggers an update of the posterior $q(\mathbf{x}_{0:t}|\hat{\mathbf{y}}_{0:t})$, implemented using a sequential Monte Carlo method. (3) Finally, the posterior mean fetal heart location at that timestep is communicated to a heart rate estimation module alongside the received Doppler data. (b) Real-time lab setup mimicking the scenario described in (a), using a phased array transducer mounted on a translation stage that transmits a focused beam controlled by the agent to track a "beating" chicken heart. (c) Positional tracking and heart rate estimation (red) from adaptively steered focused beams, against ground truth (blue).
  • Figure 4: Example: Multipath haze suppression using diffusion models. (a) Example input $\mathbf{y}$ and dehazed output $\hat{\mathbf{x}}$ (top) along with automatic left-ventricular (LV) segmentations by EchoNet ouyang2019echonet (bottom). Note how the LV area is underestimated on the hazy, cluttered, input data, and how this is improved after dehazing. (b) Tradeoff between tissue contrast (gCNR) and lateral resolution (FWHM), showing how dehazing by generative diffusion models (ours) greatly improves the gCNR while compromising much less on resolution than denoising methods such as BM3D. It also compares favourably against a discriminative neural network using supervised training on phantom data (NCSNv2). Image adapted from stevens2024dehazing.
  • Figure 5: Example: Active subsampling using temporal diffusion models. (a) Overview of the perception-action loop for ultrasound scanline selection. At timestep $t$, the agent acquires $k$ scanlines that maximize expected information gain. It then combines them with the past $k(T-1)$ scanlines acquired at timesteps $t-T+1:t-1$, and perform perceptual inference via diffusion posterior sampling to yield samples ${\mathbf x}_t^{(i)}$. The most likely sample is selected as the final reconstruction. Next, the posterior samples are used to estimate expected information gain at timestep $t+1$, and the action $a^*_{t+1}$ that maximizes it is used to acquire the next scanlines. (b) Mean absolute reconstruction error (MAE) of a cognitive agent (Max information Sampling) vs a random agent (Random Sampling). Both use the conditional temporal diffusion model for inference. Each blue dot is a hold-out test sequence from the CAMUS dataset. Points above the red dashed line are points for which cognitive imaging outperforms a random scanline selection.
  • ...and 1 more figures