Table of Contents
Fetching ...

Predicting Future States with Spatial Point Processes in Single Molecule Resolution Spatial Transcriptomics

Biraaj Rout, Priyanshi Borad, Parisa Boodaghi Malidarreh, Mohammad Sadegh Nasr, Jillur Rahman Saurav, Kelli Fenelon, Jai Prakash Veerla, Jacob M. Luber, Theodora Koromila

TL;DR

Predicting how gene expression patterns change in space and time during Drosophila embryogenesis is challenging due to the dynamic, subcellular organization of transcripts. The authors propose a pipeline that uses XGBoost to forecast future distributions of cells expressing sogD by integrating temporally aligned spatial point-process features, including Ripley’s K-function, from the preceding developmental stage. They provide an end-to-end workflow from high-resolution live imaging and segmentation to grid-based counting and evaluation, reporting ablation studies and comparisons between sogD and sogD_ΔSu(H) datasets. The work delivers an RNA-velocity–like capability for spatial transcriptomics, enabling stage-level predictions and offering a framework for understanding how regulatory inputs shape spatiotemporal gene expression in whole-embryo imaging.

Abstract

In this paper, we introduce a pipeline based on XGboost to predict the future distribution of cells that are expressed by the Sog-D gene (active cells) in both the Anterior to posterior (AP) and the Dorsal to Ventral (DV) axis of the Drosophila in embryogenesis process. This method provides insights about how cells and living organisms control gene expression in super resolution whole embryo spatial transcriptomics imaging at sub cellular, single molecule resolution. An XGboost model was used to predict the next stage active distribution based on the previous one. To achieve this goal, we leveraged temporally resolved, spatial point processes by including Ripley's K-function in conjunction with the cell's state in each stage of embryogenesis, and found average predictive accuracy of active cell distribution. This tool is analogous to RNA Velocity for spatially resolved developmental biology, from one data point we can predict future spatially resolved gene expression using features from the spatial point processes.

Predicting Future States with Spatial Point Processes in Single Molecule Resolution Spatial Transcriptomics

TL;DR

Predicting how gene expression patterns change in space and time during Drosophila embryogenesis is challenging due to the dynamic, subcellular organization of transcripts. The authors propose a pipeline that uses XGBoost to forecast future distributions of cells expressing sogD by integrating temporally aligned spatial point-process features, including Ripley’s K-function, from the preceding developmental stage. They provide an end-to-end workflow from high-resolution live imaging and segmentation to grid-based counting and evaluation, reporting ablation studies and comparisons between sogD and sogD_ΔSu(H) datasets. The work delivers an RNA-velocity–like capability for spatial transcriptomics, enabling stage-level predictions and offering a framework for understanding how regulatory inputs shape spatiotemporal gene expression in whole-embryo imaging.

Abstract

In this paper, we introduce a pipeline based on XGboost to predict the future distribution of cells that are expressed by the Sog-D gene (active cells) in both the Anterior to posterior (AP) and the Dorsal to Ventral (DV) axis of the Drosophila in embryogenesis process. This method provides insights about how cells and living organisms control gene expression in super resolution whole embryo spatial transcriptomics imaging at sub cellular, single molecule resolution. An XGboost model was used to predict the next stage active distribution based on the previous one. To achieve this goal, we leveraged temporally resolved, spatial point processes by including Ripley's K-function in conjunction with the cell's state in each stage of embryogenesis, and found average predictive accuracy of active cell distribution. This tool is analogous to RNA Velocity for spatially resolved developmental biology, from one data point we can predict future spatially resolved gene expression using features from the spatial point processes.
Paper Structure (15 sections, 1 equation, 8 figures, 1 table)

This paper contains 15 sections, 1 equation, 8 figures, 1 table.

Figures (8)

  • Figure 1: Computational analysis of super-resolution live imaging compares nuclei activity and predicts stages. (A) Super-resolution live imaging set-up of hand-dechorionated Drosophila embryos of $MCP-GFP Nup-RFP (*.MCP-GFP) X sogD\_ {\Delta Su(H)}-MS2$. (B) Implemented pipeline, starting with using Cellpose 2.2.3 for segmentation, followed by subsequent stages involving active celll detection, tabulating data and feature selection, training ,and testing. These steps collectively aim to predict the distribution of active cells for the next stage. (C) The $MCP-GFP-MS2$ system tracks transcription via $GFP-tagged$$MCP$ binding to $MS2$ loops (Stage $A-$ NC13, double-dot ".." NC14$A$, NC14$B$, Stage n-1 NC14$C$) and nuclei activity of live imaging snapshots is compared with Cellpose generated images.
  • Figure 2: Testing of optimal sampling parameters. (A–C) Depictions of three distinct grid configurations, labeled A, B, and C, corresponding to grid sizes of 32×32, 26×26, and 8×8, respectively. (D) presents the error plot associated with each grid configuration (A–C), facilitating the identification of the optimal grid size based on the lowest error value.
  • Figure 3: The distribution of active cells achieving the best accuracy, based on mae values, is shown for the four stages of NC 14 (A–D). In panels A–D, green rectangles indicate the frames from the previous stage used to predict the blue frames of the current stage. The features from the previous stage frames were averaged to predict the average number of active cells in each grid for the current stage. For each stage, the right-hand plot illustrates the predicted and actual distribution of active cells along the DV axis, represented by dashed blue and red lines, respectively. In these plots, the grid numbers along the DV axis are shown from 0 to 16, the average number of active cells per grid is displayed from 0 to 50, and the embryo width along the DV axis spans from 0 to 100.
  • Figure 4: (A) Distribution of active cells along the DV axis for the case dataset, where the red line represents the actual distribution and the dashed blue line corresponds to the predicted distribution. (B) Bootstrap distribution results for $AP-mae$ , $DV-mae$, and $mean-mae$ presented from left to right, respectively. (C) Actual DV distribution for the case and control datasets, shown in light and dark red, respectively, to illustrate changes in width over time. (D) Predicted DV distribution for the case and control datasets, represented in dashed light and dark blue, respectively.
  • Figure Sup. Fig-1: Detailed information about our pipeline, (A) Ripley's K function are used as a feature, (B) XGBoost is the machine learning method we used (C) the output of the pipeline that shows the distribution of active cells.
  • ...and 3 more figures