Data-Efficient Learning of Anomalous Diffusion with Wavelet Representations: Enabling Direct Learning from Experimental Trajectories

Gongyi Wang; Yu Zhang; Zihan Huang

Data-Efficient Learning of Anomalous Diffusion with Wavelet Representations: Enabling Direct Learning from Experimental Trajectories

Gongyi Wang, Yu Zhang, Zihan Huang

TL;DR

The paper tackles data scarcity in anomalous-diffusion analysis by introducing a wavelet-based trajectory representation that maps experimental trajectories to multi-channel wavelet modulus scalograms. This representation, combined with vision models, enables efficient learning directly from experimental data and reduces reliance on large simulated datasets. Across simulated benchmarks and real SPT data in F-actin networks, the approach yields superior diffusion-exponent regression and diffusion-model/mesh-size classification, notably outperforming simulation-trained baselines even with thousands rather than millions of trajectories. The authors also uncover interpretable scale fingerprints in the wavelet spectra that physically reflect diffusion mechanisms, offering avenues for segmentation and unsupervised discovery in complex transport systems.

Abstract

Machine learning (ML) has become a versatile tool for analyzing anomalous diffusion trajectories, yet most existing pipelines are trained on large collections of simulated data. In contrast, experimental trajectories, such as those from single-particle tracking (SPT), are typically scarce and may differ substantially from the idealized models used for simulation, leading to degradation or even breakdown of performance when ML methods are applied to real data. To address this mismatch, we introduce a wavelet-based representation of anomalous diffusion that enables data-efficient learning directly from experimental recordings. This representation is constructed by applying six complementary wavelet families to each trajectory and combining the resulting wavelet modulus scalograms. We first evaluate the wavelet representation on simulated trajectories from the andi-datasets benchmark, where it clearly outperforms both feature-based and trajectory-based methods with as few as 1000 training trajectories and still retains an advantage on large training sets. We then use this representation to learn directly from experimental SPT trajectories of fluorescent beads diffusing in F-actin networks, where the wavelet representation remains superior to existing alternatives for both diffusion-exponent regression and mesh-size classification. In particular, when predicting the diffusion exponents of experimental trajectories, a model trained on 1200 experimental tracks using the wavelet representation achieves significantly lower errors than state-of-the-art deep learning models trained purely on $10^6$ simulated trajectories. We associate this data efficiency with the emergence of distinct scale fingerprints disentangling underlying diffusion mechanisms in the wavelet spectra.

Data-Efficient Learning of Anomalous Diffusion with Wavelet Representations: Enabling Direct Learning from Experimental Trajectories

TL;DR

Abstract

Data-Efficient Learning of Anomalous Diffusion with Wavelet Representations: Enabling Direct Learning from Experimental Trajectories

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)