Table of Contents
Fetching ...

A Kolmogorov metric embedding for live cell microscopy signaling patterns

Layton Aho, Mark Winter, Marc DeCarlo, Agne Frismantiene, Yannick Blum, Paolo Armando Gagliardi, Olivier Pertz, Andrew R. Cohen

TL;DR

This work presents a metric embedding framework for high-dimensional live-cell microscopy movies based on Kolmogorov complexity, using the normalized information distance to embed spatiotemporal patterns into a reproducing kernel Hilbert space. Central to the approach is the cell signaling structure function, which maps nuclear versus cytoplasmic intensity at cell centroids into a metric, enabling a lossless compression pipeline (via FLIF) to define pairwise movie distances. The authors demonstrate the method across multiple biological contexts, including ERK signaling in 2-D monolayers, stem cell colonies, optogenetically manipulated 3-D spheroids, and synthetic datasets, showing that the embedding preserves meaningful differences and relates signaling dynamics to cellular velocity. The framework is unsupervised and training-data free, with open-source software and data available for further exploration and downstream learning in the embedding space, offering a flexible tool for pattern discovery in complex imaging data.

Abstract

We present a metric embedding that captures spatiotemporal patterns of cell signaling dynamics in 5-D $(x,y,z,channel,time)$ live cell microscopy movies. The embedding uses a metric distance called the normalized information distance (NID) based on Kolmogorov complexity theory, an absolute measure of information content between digital objects. The NID uses statistics of lossless compression to compute a theoretically optimal metric distance between pairs of 5-D movies, requiring no a priori knowledge of expected pattern dynamics, and no training data. The cell signaling structure function (SSF) is defined using a class of metric 3-D image filters that compute at each spatiotemporal cell centroid the voxel intensity configuration of the nucleus w.r.t. the surrounding cytoplasm, or a functional output e.g. velocity. The only parameter is the expected cell radii ($μm$). The SSF can be optionally combined with segmentation and tracking algorithms. The resulting lossless compression pipeline represents each 5-D input movie as a single point in a metric embedding space. The utility of a metric embedding follows from Euclidean distance between any points in the embedding space approximating optimally the pattern difference, as measured by the NID, between corresponding pairs of 5-D movies. This is true throughout the embedding space, not only at points corresponding to input images. Examples are shown for synthetic data, for 2-D+time movies of ERK and AKT signaling under different oncogenic mutations in human epithelial (MCF10A) cells, for 3-D MCF10A spheroids under optogenetic manipulation of ERK, and for ERK dynamics during colony differentiation in human stem cells.

A Kolmogorov metric embedding for live cell microscopy signaling patterns

TL;DR

This work presents a metric embedding framework for high-dimensional live-cell microscopy movies based on Kolmogorov complexity, using the normalized information distance to embed spatiotemporal patterns into a reproducing kernel Hilbert space. Central to the approach is the cell signaling structure function, which maps nuclear versus cytoplasmic intensity at cell centroids into a metric, enabling a lossless compression pipeline (via FLIF) to define pairwise movie distances. The authors demonstrate the method across multiple biological contexts, including ERK signaling in 2-D monolayers, stem cell colonies, optogenetically manipulated 3-D spheroids, and synthetic datasets, showing that the embedding preserves meaningful differences and relates signaling dynamics to cellular velocity. The framework is unsupervised and training-data free, with open-source software and data available for further exploration and downstream learning in the embedding space, offering a flexible tool for pattern discovery in complex imaging data.

Abstract

We present a metric embedding that captures spatiotemporal patterns of cell signaling dynamics in 5-D live cell microscopy movies. The embedding uses a metric distance called the normalized information distance (NID) based on Kolmogorov complexity theory, an absolute measure of information content between digital objects. The NID uses statistics of lossless compression to compute a theoretically optimal metric distance between pairs of 5-D movies, requiring no a priori knowledge of expected pattern dynamics, and no training data. The cell signaling structure function (SSF) is defined using a class of metric 3-D image filters that compute at each spatiotemporal cell centroid the voxel intensity configuration of the nucleus w.r.t. the surrounding cytoplasm, or a functional output e.g. velocity. The only parameter is the expected cell radii (). The SSF can be optionally combined with segmentation and tracking algorithms. The resulting lossless compression pipeline represents each 5-D input movie as a single point in a metric embedding space. The utility of a metric embedding follows from Euclidean distance between any points in the embedding space approximating optimally the pattern difference, as measured by the NID, between corresponding pairs of 5-D movies. This is true throughout the embedding space, not only at points corresponding to input images. Examples are shown for synthetic data, for 2-D+time movies of ERK and AKT signaling under different oncogenic mutations in human epithelial (MCF10A) cells, for 3-D MCF10A spheroids under optogenetic manipulation of ERK, and for ERK dynamics during colony differentiation in human stem cells.
Paper Structure (21 sections, 7 equations, 11 figures)

This paper contains 21 sections, 7 equations, 11 figures.

Figures (11)

  • Figure 1: The normalized information distance is a metric embedding kernel for spatiotemporal patterns of cell signaling. Start with $N$ 2-D or 3-D multichannel time-lapse live cell microscopy movies. Threshold and denoise the movies, optionally segmenting, tracking and lineaging. Metric structure enhancing image filters compute cell signaling state, or functional outputs such as velocity. The cell signaling state is quantified by the intensity of the nuclear pixels w.r.t the surrounding cytoplasm. 3-D image compression is then used as a normalized pairwise distance metric between structure enhanced movies. The resulting distance matrix defines an optimal embedding based on visual differences among the input movies as captured by the normalized compression distance (NCD). Each input movie is represented as a single point in the embedding space. Importantly, all points in the embedded space, not just the ones corresponding to the input movies, optimally represent the pattern characteristics of a corresponding "true" input image. Supervised or unsupervised learning algorithms on the kernel embedding space are less complex and more generalizable Manton2015. No training data or other a priori knowledge is required, although it can be utilized if available.
  • Figure 1: 2-D projections of 3-D (2-D+time) SSF outputs for movies showing ERK signaling in colonies of human stem cells for 10 differentiated (a) and 10 self-renewing (b) movies. The vertical axis represents the spatial dimension, and is obtained by taking a maximum intensity projection along the second spatial axis. The horizontal axis in each panel represents time. These quantitative visualizations show ERK patterns throughout the process of colony development. The 2-D visualization here is lower dimensional than the 3-D SSF image that is a lossless representation input to the compression distance for embedding. The full dataset can be seen here: https://leverjs.net/ssfCluster/HSC.
  • Figure 1: Animated version of Figure \ref{['fig:figure1']}. A time-lapse movie showing ERK-KTR signaling in a monolayer of human breast epithelial cells (MCF10A) from the PIK3CA_H1047R mutation with cellular activation indicated by dark nuclei against bright cytoplasm clearly propagating across the image (left panel). The 3-D SSF metric image filter output (center panel) is the input to the FLIF 3-D compression used with the normalized compression distance to define the RKHS embedding, shown here with the current image frame overlaid in gray. The 2-D projection of the 3-D SSF output image (right) panel facilitates human visualization, with the current timepoint indicated by the red line and the signaling patterns clearly visible as diagonal yellow stripes of activation across the monolayer https://bioimage.coe.drexel.edu/media/ssfClustering/movie1.mp4.
  • Figure 2: A Kolmogorov embedding of live cell and tissue microscopy movies enables improved visualization and quantification of spatiotemporal patterns of cell signaling. Time-lapse microscopy captured 147 movies from 6 different conditions, known to differ in ERK signaling dynamics, in 2-D monolayers of human breast epithelial (MCF10A) cells (A). In (B), we show the output of the cell signaling structure function (SSF) from the beginning of the movie to the image frame shown in (A), overlaid in gray. Each colored dot along the gray image represents the cell signaling activation at that location and time. A maximum intensity projection of the 2-D+time SSF output (B) to a 1-D+time image (C) allows visualization of the waves of ERK signaling activation propagating throughout the tissue as diagonal yellow stripes. The red line in (C) indicates time of the image frame in (A, B). The Kolmogorov embedding (D) uses the normalized information distance with 3-D image compression to define a pattern space that reflects everywhere the similarity characteristics visually observable in the input movies. The cluster structure function (CSF) reported in (D) is an unsupervised measure of how meaningfully the embedding represents the six different ground truth classes, reported as [mean, standard deviation] of the per-cluster optimality deficiency for ERK signaling in the 6-D RKHS. See also Supplementary Movie \ref{['Movie1']}.
  • Figure 2: 2-D projections of 3-D+time ERK (a) and velocity (b) SSF output images for 3-D+time movies of mammary acini, spheroids of human female mammary epithelial MCF10A cells. The original 3-D+time movies are processed with a 3-D SSF via a maximum intensity projection along the $Z$ spatial axis to form the input to the compression. For the 2-D rendering shown here, the spatial component is again projected via maximum intensity along the $Y$ axis. Each movie is labeled by the time (hours) before optogenetic excitation, the time (hours) that optogenetic excitation lasts (pulses every 30 minutes), and the age of the organoid (days). The dashed vertical lines indicate the beginning and end of the optogenetic excitation. The 2-D projections shown here are useful for human visualization, the 3-D SSF output images are input the FLIF compression algorithm to compute the pairwise NCD to generate the reproducing kernel Hilbert space embedding (\ref{['Fig.opto']}). The optogenetic excitation dataset can be seen here: https://leverjs.net/ssfCluster/optoGenetic.
  • ...and 6 more figures

Theorems & Definitions (1)

  • proof