Extraction and Recovery of Spatio-Temporal Structure in Latent Dynamics Alignment with Diffusion Models

Yule Wang; Zijing Wu; Chengrui Li; Anqi Wu

Extraction and Recovery of Spatio-Temporal Structure in Latent Dynamics Alignment with Diffusion Models

Yule Wang, Zijing Wu, Chengrui Li, Anqi Wu

TL;DR

ERDiff addresses neural distribution shift by extracting the spatio-temporal structure of source latent dynamics with a diffusion model and guiding a maximum-likelihood alignment in the target domain to recover source-consistent latent trajectories. It combines a diffusion-denoiser trained on latent dynamics with a VAE to capture global structure, and a tractable alignment objective that leverages a diffusion-based prior and Sinkhorn regularization. Empirical results on synthetic data and two neural datasets (NHP M1 and rat CA1) show ERDiff preserves latent structure and improves decoding across cross-day and inter-subject scenarios, outperforming metric-based and adversarial baselines. The approach offers a scalable, unsupervised test-time adaptation framework for robust neural decoding and potentially broader time-series domain adaptation.

Abstract

In the field of behavior-related brain computation, it is necessary to align raw neural signals against the drastic domain shift among them. A foundational framework within neuroscience research posits that trial-based neural population activities rely on low-dimensional latent dynamics, thus focusing on the latter greatly facilitates the alignment procedure. Despite this field's progress, existing methods ignore the intrinsic spatio-temporal structure during the alignment phase. Hence, their solutions usually lead to poor quality in latent dynamics structures and overall performance. To tackle this problem, we propose an alignment method ERDiff, which leverages the expressivity of the diffusion model to preserve the spatio-temporal structure of latent dynamics. Specifically, the latent dynamics structures of the source domain are first extracted by a diffusion model. Then, under the guidance of this diffusion model, such structures are well-recovered through a maximum likelihood alignment procedure in the target domain. We first demonstrate the effectiveness of our proposed method on a synthetic dataset. Then, when applied to neural recordings from the non-human primate motor cortex, under both cross-day and inter-subject settings, our method consistently manifests its capability of preserving the spatiotemporal structure of latent dynamics and outperforms existing approaches in alignment goodness-of-fit and neural decoding performance.

Extraction and Recovery of Spatio-Temporal Structure in Latent Dynamics Alignment with Diffusion Models

TL;DR

Abstract

Paper Structure (22 sections, 23 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 23 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Preliminary
Methodology
Maximum likelihood alignment
Spatio-temporal structure extraction and source domain learning
Spatio-temporal structure recovery and distribution alignment
Experiments
Synthetic dataset
Neural datasets
Discussion
Methodology details
DM architecture details
VAE and DM cooperative source domain learning details
Detailed derivation of maximum likelihood alignment
Relationship between KL-Divergence and DSM Loss
...and 7 more sections

Figures (8)

Figure 1: Empirical study. Latent dynamics (3D visualization) of the source domain and the aligned target domain by JSDM on a primary motor cortex dataset.
Figure 2: A schematic overview of spatio-temporal structure extraction and recovery in ERDiff. (A) The architecture of DM for spatio-temporal structure extraction. (B) A descriptive diagram of structure recovery schematic. The left presents the extracted spatio-temporal structure of the source-domain latent dynamics; the right illustrates the structure-aware maximum likelihood alignment guidance in ERDiff.
Figure 3: Experimental results on the synthetic dataset. (A) Performance comparison on trial-average negative log-likelihood (NLL) and KL Divergence (KLD). $\downarrow$ means the lower the better. ERDiff achieves the second-lowest NLL and the lowest KLD. (B) True continuous Bernoulli dynamics in the source domain compared to the latent dynamics aligned by ERDiff and JSDM in the target domain (blue dots denote the fixed points). ERDiff preserves the spatio-temporal structure of latent dynamics much better.
Figure 4: Motor cortex dataset and experimental results. (A) Illustration of the center-out reaching task of non-human primates. (B) The 3D visualization of trial-averaged latent dynamics corresponding to each reaching direction in the source domain. (C) The 3D visualization of trial-averaged latent dynamics corresponding to each reaching direction aligned by ERDiff, DAF, and JSDM given the target distribution from cross-day and inter-subject settings. We observe that ERDiff preserves the spatio-temporal structure of latent dynamics well.
Figure 5: (A) True source-domain trial velocities and behavior decoding trajectories inferred from a ridge regression model given the latent dynamics aligned by ERDiff and JSDM, respectively. We can observe that ERDiff not only preserves the spatio-temporal structure but also decodes the direction more accurately. (B) We compare the decoding performance of ERDiff, DAF, and JSDM with a decrease in the sampling density of trials on the target domain. We can observe that ERDiff maintains a relatively high accuracy under low sampling densities.
...and 3 more figures

Extraction and Recovery of Spatio-Temporal Structure in Latent Dynamics Alignment with Diffusion Models

TL;DR

Abstract

Extraction and Recovery of Spatio-Temporal Structure in Latent Dynamics Alignment with Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)