Table of Contents
Fetching ...

Cell as Point: One-Stage Framework for Efficient Cell Tracking

Yaxuan Song, Jianan Fan, Heng Huang, Mei Chen, Weidong Cai

TL;DR

This work introduces CAP, an end-to-end one-stage framework that treats each cell as a point to enable simultaneous tracking and lineage reasoning without segmentation or detection. It integrates a transformer-based cell-joint tracking module with 4D correlation volumes and iterative updates, enhanced by adaptive event-guided sampling to balance mitosis events and rolling-as-window inference for long sequences. CAP achieves strong tracking accuracy with substantially reduced inference time across multiple benchmarks (DeepCell and ISBI CTC) and demonstrates robust lineage reconstruction in crowded scenes. The approach reduces annotation requirements, improves efficiency, and offers practical impact for high-throughput cell-tracking tasks and downstream biological analysis.

Abstract

Conventional multi-stage cell tracking approaches rely heavily on detection or segmentation in each frame as a prerequisite, requiring substantial resources for high-quality segmentation masks and increasing the overall prediction time. To address these limitations, we propose CAP, a novel end-to-end one-stage framework that reimagines cell tracking by treating Cell as Point. Unlike traditional methods, CAP eliminates the need for explicit detection or segmentation, instead jointly tracking cells for sequences in one stage by leveraging the inherent correlations among their trajectories. This simplification reduces both labeling requirements and pipeline complexity. However, directly processing the entire sequence in one stage poses challenges related to data imbalance in capturing cell division events and long sequence inference. To solve these challenges, CAP introduces two key innovations: (1) adaptive event-guided (AEG) sampling, which prioritizes cell division events to mitigate the occurrence imbalance of cell events, and (2) the rolling-as-window (RAW) inference strategy, which ensures continuous and stable tracking of newly emerging cells over extended sequences. By removing the dependency on segmentation-based preprocessing while addressing the challenges of imbalanced occurrence of cell events and long-sequence tracking, CAP demonstrates promising cell tracking performance and is 8 to 32 times more efficient than existing methods. The code and model checkpoints will be available soon.

Cell as Point: One-Stage Framework for Efficient Cell Tracking

TL;DR

This work introduces CAP, an end-to-end one-stage framework that treats each cell as a point to enable simultaneous tracking and lineage reasoning without segmentation or detection. It integrates a transformer-based cell-joint tracking module with 4D correlation volumes and iterative updates, enhanced by adaptive event-guided sampling to balance mitosis events and rolling-as-window inference for long sequences. CAP achieves strong tracking accuracy with substantially reduced inference time across multiple benchmarks (DeepCell and ISBI CTC) and demonstrates robust lineage reconstruction in crowded scenes. The approach reduces annotation requirements, improves efficiency, and offers practical impact for high-throughput cell-tracking tasks and downstream biological analysis.

Abstract

Conventional multi-stage cell tracking approaches rely heavily on detection or segmentation in each frame as a prerequisite, requiring substantial resources for high-quality segmentation masks and increasing the overall prediction time. To address these limitations, we propose CAP, a novel end-to-end one-stage framework that reimagines cell tracking by treating Cell as Point. Unlike traditional methods, CAP eliminates the need for explicit detection or segmentation, instead jointly tracking cells for sequences in one stage by leveraging the inherent correlations among their trajectories. This simplification reduces both labeling requirements and pipeline complexity. However, directly processing the entire sequence in one stage poses challenges related to data imbalance in capturing cell division events and long sequence inference. To solve these challenges, CAP introduces two key innovations: (1) adaptive event-guided (AEG) sampling, which prioritizes cell division events to mitigate the occurrence imbalance of cell events, and (2) the rolling-as-window (RAW) inference strategy, which ensures continuous and stable tracking of newly emerging cells over extended sequences. By removing the dependency on segmentation-based preprocessing while addressing the challenges of imbalanced occurrence of cell events and long-sequence tracking, CAP demonstrates promising cell tracking performance and is 8 to 32 times more efficient than existing methods. The code and model checkpoints will be available soon.

Paper Structure

This paper contains 31 sections, 6 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Our proposed CAP is an end-to-end trained framework leveraging the idea of Cell as Point to facilitate tracking cells efficiently. As (a) illustrates, instead of previous work requiring segmentation (SEG) or detection (DET) as a prerequisite for final tracking (TRA), CAP is able to track all cells within sequence frames in one stage. (b) shows that CAP reduces the inference time ($2.9$s) by approximately $\mathbf{8}$ to $\mathbf{32}$ times compared to previous works while maintaining high tracking performance of $0.93$. (c) demonstrates the cell tracking result predicted by CAP.
  • Figure 2: Cell Point Trajectory and Visibility. M, D1, and D2 denote mother cell, daughter cell #1, and daughter cell #2. (x,y) in $L_t$ represent the location coordinates, and $0$ or $1$ in $V_t$ respectively represent non-existing and existing cells. Three valid trajectory and visibility status: (a) M has not divided; (b) M has divided, and only D1 occurs in the $t$-th frame; (c) M has divided, and both D1 and D2 occur in the $t$-th frame.
  • Figure 3: Overview of CAP framework. The training sequence $T_s$ is sampled from the entire frame sequence using AEG strategy. For each $T_s$, the cell point trajectories and visibilities are iteratively refined, starting from their initialization. For iteration $1,\dots, M$, updates are made to cell point trajectory$\hat{L}$ and tracking feature$F$ with cell point visibility$\hat{V}$ computed at the final ($M$-th) update. This figure illustrates a single iteration of the optimization process.
  • Figure 4: Visualization of the DeepCell Dataset. The datasets HeLa, PC-3, 3T3, and RAW264 have different cell densities, sizes, and luminances.
  • Figure 5: Comparison of Quality and Requirement of Data Annotation. Fluo-C2DL-Huh7 does NOT have segmentation ST. Instead of previous works utilizing all types of mask information with original images in the figure, CAP only utilizes the Tracking GT masks with original images (in the orange block).
  • ...and 3 more figures