Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

Yubai Yuan; Babak Shahbaba; Norbert Fortin; Keiland Cooper; Qing Nie; Annie Qu

Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

Yubai Yuan, Babak Shahbaba, Norbert Fortin, Keiland Cooper, Qing Nie, Annie Qu

TL;DR

The paper tackles cross-subject heterogeneity in high-dimensional neural data by introducing Integrated Latent Alignment (ILA), which combines autoencoder-based latent embeddings with optimal-transport alignment to produce a common latent space across subjects. It leverages $GW$ and $FGW$ distances to align geometric structures and covariate–outcome relations, enabling both cross-subject supervised learning and temporal integration of longitudinal data. Through simulations and a real rodent electrophysiology study, ILA demonstrates improved classification performance and reveals shared neural coding trajectories, even with small subject numbers. The approach offers a principled, generalizable tool for heterogeneous data integration with broad implications for neuroscience and other domains with cross-subject variability.

Abstract

Detecting dynamic patterns of task-specific responses shared across heterogeneous datasets is an essential and challenging problem in many scientific applications in medical science and neuroscience. In our motivating example of rodent electrophysiological data, identifying the dynamical patterns in neuronal activity associated with ongoing cognitive demands and behavior is key to uncovering the neural mechanisms of memory. One of the greatest challenges in investigating a cross-subject biological process is that the systematic heterogeneity across individuals could significantly undermine the power of existing machine learning methods to identify the underlying biological dynamics. In addition, many technically challenging neurobiological experiments are conducted on only a handful of subjects where rich longitudinal data are available for each subject. The low sample sizes of such experiments could further reduce the power to detect common dynamic patterns among subjects. In this paper, we propose a novel heterogeneous data integration framework based on optimal transport to extract shared patterns in complex biological processes. The key advantages of the proposed method are that it can increase discriminating power in identifying common patterns by reducing heterogeneity unrelated to the signal by aligning the extracted latent spatiotemporal information across subjects. Our approach is effective even with a small number of subjects, and does not require auxiliary matching information for the alignment. In particular, our method can align longitudinal data across heterogeneous subjects in a common latent space to capture the dynamics of shared patterns while utilizing temporal dependency within subjects.

Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

TL;DR

and

distances to align geometric structures and covariate–outcome relations, enabling both cross-subject supervised learning and temporal integration of longitudinal data. Through simulations and a real rodent electrophysiology study, ILA demonstrates improved classification performance and reveals shared neural coding trajectories, even with small subject numbers. The approach offers a principled, generalizable tool for heterogeneous data integration with broad implications for neuroscience and other domains with cross-subject variability.

Abstract

Paper Structure (11 sections, 19 equations, 5 figures, 3 tables)

This paper contains 11 sections, 19 equations, 5 figures, 3 tables.

Introduction
Background and Notations
Methodology
Integrated supervised learning on multiple heterogeneous datasets
Temporal integration of multiple heterogeneous dataset
Simulation Study
Simulations of data integration for classification
Simulations for temporal integration for multiple datasets
Simulations for Synthetic Neuronal Spike Data
Real Data Application
Discussion

Figures (5)

Figure 1: Neural activity was recorded from hippocampal region CA1 as animals performed a complex non-spatial sequence memory task. (a) The task involves repeated presentations of sequences of non-spatial events (odor stimuli) to subjects (rodents). Using an automated odor delivery system (top left), sequences of five odors were presented in the same odor port. (b) Example ensemble activity from representative subject during one sequence presentation.
Figure 2: The proposed integrated latent alignment framework (ILA). In stage 1, we utilize an autoencoder to compress the electrophysiological data for each subject. In stage 2, we use optimal transport to align the compressed data from different subjects in latent spaces. In the last stage, a supervised learning algorithm is implemented in the aligned latent space to extract common patterns.
Figure 3: The four-odor classification accuracy based on the three rodents' latent neural features extracted by autoencoder within time interval $250\sim 500$ milliseconds. The blue line denotes the classification accuracy from KNN trained by the individual learning method at each time point. The red line denotes KNN's classification accuracy trained by the proposed data integration method in Section 3.1.
Figure 4: The four-odor classification accuracy within time interval $250 \sim 500$ milliseconds based on rodents' electrophysiological data. The blue line denotes the classification accuracy from KNN trained by the individual learning method at each time point. The red line denotes KNN's classification accuracy trained by the proposed temporal integration method in Section 3.2.
Figure 5: Odor-specific neuronal coding and dynamics in the aligned latent space. a. Differentiation of the neural activity across odor presentations on the aligned latent dimensions (odors A-D; data aggregated across all 5 subjects). Each dot indicates the location of the neural activity for a 100-ms bin from a given subject, with the color denoting the odor label of the corresponding trial. b. Neural activity trajectory (mean across subjects) on the latent dimensions during each odor trial type. c. Neural activity trajectories for each subject showing comparable patterns across subjects. Ellipses are consistent across panels and represent the spread of the latent coding for each odor type, specifically the two largest square-rooted singular values from the covariance of latent coding within each cluster.

Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

TL;DR

Abstract

Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

Authors

TL;DR

Abstract

Table of Contents

Figures (5)