A Kolmogorov metric embedding for live cell microscopy signaling patterns
Layton Aho, Mark Winter, Marc DeCarlo, Agne Frismantiene, Yannick Blum, Paolo Armando Gagliardi, Olivier Pertz, Andrew R. Cohen
TL;DR
This work presents a metric embedding framework for high-dimensional live-cell microscopy movies based on Kolmogorov complexity, using the normalized information distance to embed spatiotemporal patterns into a reproducing kernel Hilbert space. Central to the approach is the cell signaling structure function, which maps nuclear versus cytoplasmic intensity at cell centroids into a metric, enabling a lossless compression pipeline (via FLIF) to define pairwise movie distances. The authors demonstrate the method across multiple biological contexts, including ERK signaling in 2-D monolayers, stem cell colonies, optogenetically manipulated 3-D spheroids, and synthetic datasets, showing that the embedding preserves meaningful differences and relates signaling dynamics to cellular velocity. The framework is unsupervised and training-data free, with open-source software and data available for further exploration and downstream learning in the embedding space, offering a flexible tool for pattern discovery in complex imaging data.
Abstract
We present a metric embedding that captures spatiotemporal patterns of cell signaling dynamics in 5-D $(x,y,z,channel,time)$ live cell microscopy movies. The embedding uses a metric distance called the normalized information distance (NID) based on Kolmogorov complexity theory, an absolute measure of information content between digital objects. The NID uses statistics of lossless compression to compute a theoretically optimal metric distance between pairs of 5-D movies, requiring no a priori knowledge of expected pattern dynamics, and no training data. The cell signaling structure function (SSF) is defined using a class of metric 3-D image filters that compute at each spatiotemporal cell centroid the voxel intensity configuration of the nucleus w.r.t. the surrounding cytoplasm, or a functional output e.g. velocity. The only parameter is the expected cell radii ($μm$). The SSF can be optionally combined with segmentation and tracking algorithms. The resulting lossless compression pipeline represents each 5-D input movie as a single point in a metric embedding space. The utility of a metric embedding follows from Euclidean distance between any points in the embedding space approximating optimally the pattern difference, as measured by the NID, between corresponding pairs of 5-D movies. This is true throughout the embedding space, not only at points corresponding to input images. Examples are shown for synthetic data, for 2-D+time movies of ERK and AKT signaling under different oncogenic mutations in human epithelial (MCF10A) cells, for 3-D MCF10A spheroids under optogenetic manipulation of ERK, and for ERK dynamics during colony differentiation in human stem cells.
