Advances in Kidney Biopsy Lesion Assessment through Dense Instance Segmentation

Zhan Xiong; Junling He; Pieter Valkema; Tri Q. Nguyen; Maarten Naesens; Jesper Kers; Fons J. Verbeek

Advances in Kidney Biopsy Lesion Assessment through Dense Instance Segmentation

Zhan Xiong, Junling He, Pieter Valkema, Tri Q. Nguyen, Maarten Naesens, Jesper Kers, Fons J. Verbeek

TL;DR

This work tackles the challenge of automatic, dense instance-level lesion quantification in kidney biopsies by introducing DiffRegFormer, a unified framework that combines diffusion-based bounding-box proposals, region-focused transformers, and a modular lesion classifier to handle densely packed, multi-class, multi-scale objects within ROIs. The approach achieves state-of-the-art performance on Jones-stained WSIs with AP of 52.1% for detection and 46.8% for segmentation, and high lesion-classification precision (89.2%) with reasonable recall (64.6%), while demonstrating domain transfer to PAS-stained tissue without fine-tuning. A key contribution is the class-wise balanced sampling and separate feature streams for bbox and mask decoders, which stabilize training and enhance learning for rare lesions, enabling scalable extension with additional lesion heads. The work advances practical renal pathology tooling by enabling end-to-end ROI-level analysis, modular lesion heads, and potential cross-stain applicability, thus reducing annotation burden and inter-observer variability in clinical workflows.

Abstract

Renal biopsies are the gold standard for the diagnosis of kidney diseases. Lesion scores made by renal pathologists are semi-quantitative and exhibit high inter-observer variability. Automating lesion classification within segmented anatomical structures can provide decision support in quantification analysis, thereby reducing inter-observer variability. Nevertheless, classifying lesions in regions-of-interest (ROIs) is clinically challenging due to (a) a large amount of densely packed anatomical objects, (b) class imbalance across different compartments (at least 3), (c) significant variation in size and shape of anatomical objects and (d) the presence of multi-label lesions per anatomical structure. Existing models cannot address these complexities in an efficient and generic manner. This paper presents an analysis for a \textbf{generalized solution} to datasets from various sources (pathology departments) with different types of lesions. Our approach utilizes two sub-networks: dense instance segmentation and lesion classification. We introduce \textbf{DiffRegFormer}, an end-to-end dense instance segmentation sub-network designed for multi-class, multi-scale objects within ROIs. Combining diffusion models, transformers, and RCNNs, DiffRegFormer {is a computational-friendly framework that can efficiently recognize over 500 objects across three anatomical classes, i.e., glomeruli, tubuli, and arteries, within ROIs.} In a dataset of 303 ROIs from 148 Jones' silver-stained renal Whole Slide Images (WSIs), our approach outperforms previous methods, achieving an Average Precision of 52.1\% (detection) and 46.8\% (segmentation). Moreover, our lesion classification sub-network achieves 89.2\% precision and 64.6\% recall on 21889 object patches out of the 303 ROIs. Lastly, our model demonstrates direct domain transfer to PAS-stained renal WSIs without fine-tuning.

Advances in Kidney Biopsy Lesion Assessment through Dense Instance Segmentation

TL;DR

Abstract

Paper Structure (25 sections, 2 equations, 15 figures, 9 tables, 2 algorithms)

This paper contains 25 sections, 2 equations, 15 figures, 9 tables, 2 algorithms.

Introduction
Related Work
Methods
Preliminaries
Dense Instance Segmentation Model
Encoder
Bbox-decoder
Sampling
Mask decoder
Lesion classifier
Implementation Details
Training
Inference
Results
Datasets
...and 10 more sections

Figures (15)

Figure 1: An illustration depicts a manually annotated ROI of kidney biopsy, focusing on glomeruli, tubuli, and arteries. There are three primary challenges encountered in clinical renal biopsies: (1) a large number of objects closely touching each other; (2) the significant variation in size and shape among different instances; (3) the distribution of classes is heavily biased.
Figure 2: Comparison between regional segments and instance segments. (a) cropped instance maps from bounding boxes tightly surrounding each object; (b) ground-truth bounding boxes and instance segments within one image; (c) separated entire instance map per object.
Figure 3: Our DiffRegFormer is a one-stage anchor-free method. Instead of pre-defined anchors, we impose Gaussian noise on ground-truth boxes and generate a fixed-sized set of random bounding boxes (bbox). With feature maps extracted from the encoder, the bbox-decoder iteratively learns to denoise and predicts class-wise candidate boxes. Due to many unevenly distributed candidates, we propose a sampling module that effectively discards negative samples while maintaining a proportion of balanced positive samples for fast convergence. Finally, we make final instance masks according to the selected positive samples in the mask-decoder. For simplicity in illustration, we only choose one anatomical object per class (artery, tubule, glomerulus).
Figure 4: The bounding box (bbox)-decoder takes multi-scale feature maps and a set of random boxes as input. Then, the prediction of classes and boxes will be outputted iteratively. (a) The module comprises an initialization head for dynamic queries (orange rectangle) and multiple box refinement heads (blue rectangles). (b) In the orange rectangle in (a), the initial dynamic queries are generated via a RoIAlign pooling operator and a feed-forward network (FFN). (c) Each box refinement module (one blue rectangle in (a)) takes the previous stage's dynamic queries and proposal boxes as input, generating predictions and refined dynamic queries for the next stage. Abbreviations: dynamic queries: DQ, feed-forward network: FFN, regional features: RF.
Figure 5: The mask-decoder takes multi-scale feature maps and a set of positive boxes as input and predicts instance masks. The dynamic queries interact with regional features using cross-attention and only highlight pixels residing in proposal boxes. Final instance masks are generated from the enhanced regional feature maps. Abbreviations: dynamic queries: DQ, feed-forward network: FFN, regional features: RF.
...and 10 more figures

Advances in Kidney Biopsy Lesion Assessment through Dense Instance Segmentation

TL;DR

Abstract

Advances in Kidney Biopsy Lesion Assessment through Dense Instance Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (15)