Table of Contents
Fetching ...

Low-Rank-Modulated Functa: Exploring the Latent Space of Implicit Neural Representations for Interpretable Ultrasound Video Analysis

Julia Wolleb, Cristiana Baloescu, Alicia Durrer, Hemant D. Tagare, Xenophon Papademetris

Abstract

Implicit neural representations (INRs) have emerged as a powerful framework for continuous image representation learning. In Functa-based approaches, each image is encoded as a latent modulation vector that conditions a shared INR, enabling strong reconstruction performance. However, the structure and interpretability of the corresponding latent spaces remain largely unexplored. In this work, we investigate the latent space of Functa-based models for ultrasound videos and propose Low-Rank-Modulated Functa (LRM-Functa), a novel architecture that enforces a low-rank adaptation of modulation vectors in the time-resolved latent space. When applied to cardiac ultrasound, the resulting latent space exhibits clearly structured periodic trajectories, facilitating visualization and interpretability of temporal patterns. The latent space can be traversed to sample novel frames, revealing smooth transitions along the cardiac cycle, and enabling direct readout of end-diastolic (ED) and end-systolic (ES) frames without additional model training. We show that LRM-Functa outperforms prior methods in unsupervised ED and ES frame detection, while compressing each video frame to as low as rank k=2 without sacrificing competitive downstream performance on ejection fraction prediction. Evaluations on out-of-distribution frame selection in a cardiac point-of-care dataset, as well as on lung ultrasound for B-line classification, demonstrate the generalizability of our approach. Overall, LRM-Functa provides a compact, interpretable, and generalizable framework for ultrasound video analysis. The code is available at https://github.com/JuliaWolleb/LRM_Functa.

Low-Rank-Modulated Functa: Exploring the Latent Space of Implicit Neural Representations for Interpretable Ultrasound Video Analysis

Abstract

Implicit neural representations (INRs) have emerged as a powerful framework for continuous image representation learning. In Functa-based approaches, each image is encoded as a latent modulation vector that conditions a shared INR, enabling strong reconstruction performance. However, the structure and interpretability of the corresponding latent spaces remain largely unexplored. In this work, we investigate the latent space of Functa-based models for ultrasound videos and propose Low-Rank-Modulated Functa (LRM-Functa), a novel architecture that enforces a low-rank adaptation of modulation vectors in the time-resolved latent space. When applied to cardiac ultrasound, the resulting latent space exhibits clearly structured periodic trajectories, facilitating visualization and interpretability of temporal patterns. The latent space can be traversed to sample novel frames, revealing smooth transitions along the cardiac cycle, and enabling direct readout of end-diastolic (ED) and end-systolic (ES) frames without additional model training. We show that LRM-Functa outperforms prior methods in unsupervised ED and ES frame detection, while compressing each video frame to as low as rank k=2 without sacrificing competitive downstream performance on ejection fraction prediction. Evaluations on out-of-distribution frame selection in a cardiac point-of-care dataset, as well as on lung ultrasound for B-line classification, demonstrate the generalizability of our approach. Overall, LRM-Functa provides a compact, interpretable, and generalizable framework for ultrasound video analysis. The code is available at https://github.com/JuliaWolleb/LRM_Functa.

Paper Structure

This paper contains 18 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of our LRM-Functa framework. A) Our approach builds on VidFuncta, where each ultrasound video is represented by a video-level modulation vector $v$ and time-resolved modulation vectors $\phi=\{\phi_t\}_{t=1}^T$. (B) Pairwise cosine similarity of $\phi$ over time reveals a clear low-rank pattern. (C) Motivated by this observation, we constrain the latent modulation space to a low-rank subspace $\mathcal{B}_{\beta}$. (D/E) This structured latent space enables efficient compression-reconstruction and cardiac phase analysis.
  • Figure 2: Proposed architecture for low-rank adaptation of the modulation vectors. Each video $\mathcal{V}$ is encoded into a video-level representation $v$ and time-resolved frame representations $\phi_t$, which are projected onto a rank-$k$ subspace $\mathcal{B}_{\beta}$ to produce modulation vectors $m_t = v + \mathcal{B}_{\beta} \phi_t$. These vectors shift the activations of a shared MLP $M_\theta$, trained to reconstruct pixel intensities $\hat{z}$ at each coordinate $(x,y)$, shown as a red cross.
  • Figure 3: For $k = 2$, we visualize the raw trajectories $\phi_t$ over time. LRM-Functa$_o$ exhibits spiral-like trajectories, whereas LRM-Functa$_b$ collapses the trajectory to a line. Projecting $\phi_t$ onto the principal motion direction $p$ and filtering yields the 1D signal $s_{filt}$, which enables direct interpretable identification of ED and ES frames.
  • Figure 5: Mean SSIM and PSNR scores on the EchoNet-Dynamic test set for all comparing methods across varying latent dimensions $k$ (x-axis shown on a log2 scale). For small $k$, our LRM-Functa variants are the only methods to produce stable reconstructions, while for higher ranks, LRM-Functa$_{o}$ achieves among the highest performance.