Towards Effective and Efficient Context-aware Nucleus Detection in Histopathology Whole Slide Images
Zhongyi Shui, Ruizhe Guo, Honglin Li, Yuxuan Sun, Yunlong Zhang, Chenglu Zhu, Jiatong Cai, Pingyi Chen, Yanzhou Su, Lin Yang
TL;DR
This work tackles efficient context-aware nucleus detection in gigapixel histopathology WSIs by avoiding costly large FoV crops and instead aggregating contextual cues from surrounding patches seen during inference. It introduces a shared encoder for ROI and surrounding patches at the same magnification, uses gradient-free surrounding feature extraction, and applies self-attention to fuse context followed by cross-attention to inject it into ROI representations, with a grid-pooling step to reduce token count. The method achieves notable gains over state-of-the-art baselines in nucleus detection and segmentation on the OCELOT dataset, and introduces OCELOT-seg, a dedicated benchmark for context-aware nucleus segmentation, while delivering substantial speedups (about 3.26×) over previous approaches. These results demonstrate practical impact for rapid and accurate nucleus analysis in clinical histopathology, enabling scalable, context-aware inference on gigapixel WSIs.
Abstract
Nucleus detection in histopathology whole slide images (WSIs) is crucial for a broad spectrum of clinical applications. Current approaches for nucleus detection in gigapixel WSIs utilize a sliding window methodology, which overlooks boarder contextual information (eg, tissue structure) and easily leads to inaccurate predictions. To address this problem, recent studies additionally crops a large Filed-of-View (FoV) region around each sliding window to extract contextual features. However, such methods substantially increases the inference latency. In this paper, we propose an effective and efficient context-aware nucleus detection algorithm. Specifically, instead of leveraging large FoV regions, we aggregate contextual clues from off-the-shelf features of historically visited sliding windows. This design greatly reduces computational overhead. Moreover, compared to large FoV regions at a low magnification, the sliding window patches have higher magnification and provide finer-grained tissue details, thereby enhancing the detection accuracy. To further improve the efficiency, we propose a grid pooling technique to compress dense feature maps of each patch into a few contextual tokens. Finally, we craft OCELOT-seg, the first benchmark dedicated to context-aware nucleus instance segmentation. Code, dataset, and model checkpoints will be available at https://github.com/windygoo/PathContext.
