SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

Wenxi Li; Ruxin Zhang; Haozhe Lin; Yuchen Guo; Chao Ma; Xiaokang Yang

SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

Wenxi Li, Ruxin Zhang, Haozhe Lin, Yuchen Guo, Chao Ma, Xiaokang Yang

TL;DR

Gigapixel images pose severe speed and accuracy challenges due to vast background and extreme object scale variation. SaccadeDet introduces a dual-stage approach that first uses multi-scale density regression to locate Regions of Interest on downsampled gigapixel data, then applies a scale-normalized gaze stage that processes standardized patches with a megapixel detector. The method achieves up to 8x faster inference than prior gigapixel detectors on PANDA while maintaining high detection accuracy, and extends to Whole Slide Imaging with substantial efficiency gains. This approach offers a practical, scalable solution for fast, accurate gigapixel-level detection in medical and surveillance contexts.

Abstract

The advancement of deep learning in object detection has predominantly focused on megapixel images, leaving a critical gap in the efficient processing of gigapixel images. These super high-resolution images present unique challenges due to their immense size and computational demands. To address this, we introduce 'SaccadeDet', an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement. The cornerstone of SaccadeDet is its ability to strategically select and process image regions, dramatically reducing computational load. This is achieved through a two-stage process: the 'saccade' stage, which identifies regions of probable interest, and the 'gaze' stage, which refines detection in these targeted areas. Our approach, evaluated on the PANDA dataset, not only achieves an 8x speed increase over the state-of-the-art methods but also demonstrates significant potential in gigapixel-level pathology analysis through its application to Whole Slide Imaging.

SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 5 figures, 6 tables)

This paper contains 17 sections, 3 equations, 5 figures, 6 tables.

Introduction
Related Work
Proposed Method
Preliminaries
Overall Architecture
Saccade by Multi-scale Density Estimation
Scale-aware Mean Squared Error Loss
Patch Generation
Gaze with Scale Normalization
Experiments
Baseline
Module Analysis
Comparisons with the state-of-the-art methods
Ablation Studies
Application to Whole Slide Imaging
...and 2 more sections

Figures (5)

Figure 1: Two distinct characteristics of gigapixel images. (a) Wide-field gigapixel images exhibit a higher background rate compared to megapixel images. (b) Images showcasing objects with over 100$\times$ scale variation, demonstrating the extreme size disparities within a single image.
Figure 2: Comparative analysis of low-/high-resolution images in density regression and object detection. We employ zoom-in to differentiate between low- and high-resolution. It demonstrates that density regression (CSRNet li2018csrnet) excels in coarse-grained localization using low-resolution images, whereas object detection (RTMDet lyu2022rtmdet) achieves superior fine-grained detection with high-resolution images.
Figure 3: An overview of SaccadeDet architecture. (1) A low-resolution image is passed to the multi-scale density regression module to generate the density maps. Then, the density maps are divided into grids and calculated the density of objects for each cell. The patch with high density is cropped for the gaze stage. (2) Multi-scale patches are scaled to the same scale and the megapixel-level detector processes them to output the bounding boxes of each patch. Finally, merge all results to generate the bounding boxes of the gigapixel-level image.
Figure 4: Visualization of the procedure in saccade stage. We show the density map, the object density of the patch and the selected patch. The masks represent the corresponding patches that are discarded. It can be seen that most of the background has been discarded. This phenomenon indicates that this stage can provide a more focused patch for the gaze stage.
Figure 5: Comparison of current preprocessing method and SaccadeDet to generate regions of interest. SaccadeDet already extracted coarse-grained cancer metastasis regions, while color-based methods can only extract tissue regions.

SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

TL;DR

Abstract

SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

Authors

TL;DR

Abstract

Table of Contents

Figures (5)