Table of Contents
Fetching ...

Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans

Kwang-Hyun Uhm, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

TL;DR

Renal tumor subtype classification from multi-phase CT is challenged by variable enhancement patterns across phases and variability in radiologist assessments. The authors propose LACPANet, a lesion-aware cross-phase attention network that uses 3D inter-phase attention and a multi-scale attention scheme to model phase relationships at multiple lesion scales, guided by a lesion segmentation network. The method achieves state-of-the-art accuracy, outperforming both semi-automated and fully-automated baselines across multiple metrics and demonstrating robustness through ablations and attention visualizations. This approach has practical implications for non-invasive, reliable preoperative tumor subtyping using standard multi-phase CT protocols.

Abstract

Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Although deep learning-based approaches have been recently explored for differential diagnosis of kidney cancer, they do not explicitly model the relationships between CT phases in the network design, limiting the diagnostic performance. In this paper, we propose a novel lesion-aware cross-phase attention network (LACPANet) that can effectively capture temporal dependencies of renal lesions across CT phases to accurately classify the lesions into five major pathological subtypes from time-series multi-phase CT images. We introduce a 3D inter-phase lesion-aware attention mechanism to learn effective 3D lesion features that are used to estimate attention weights describing the inter-phase relations of the enhancement patterns. We also present a multi-scale attention scheme to capture and aggregate temporal patterns of lesion features at different spatial scales for further improvement. Extensive experiments on multi-phase CT scans of kidney cancer patients from the collected dataset demonstrate that our LACPANet outperforms state-of-the-art approaches in diagnostic accuracy.

Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans

TL;DR

Renal tumor subtype classification from multi-phase CT is challenged by variable enhancement patterns across phases and variability in radiologist assessments. The authors propose LACPANet, a lesion-aware cross-phase attention network that uses 3D inter-phase attention and a multi-scale attention scheme to model phase relationships at multiple lesion scales, guided by a lesion segmentation network. The method achieves state-of-the-art accuracy, outperforming both semi-automated and fully-automated baselines across multiple metrics and demonstrating robustness through ablations and attention visualizations. This approach has practical implications for non-invasive, reliable preoperative tumor subtyping using standard multi-phase CT protocols.

Abstract

Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Although deep learning-based approaches have been recently explored for differential diagnosis of kidney cancer, they do not explicitly model the relationships between CT phases in the network design, limiting the diagnostic performance. In this paper, we propose a novel lesion-aware cross-phase attention network (LACPANet) that can effectively capture temporal dependencies of renal lesions across CT phases to accurately classify the lesions into five major pathological subtypes from time-series multi-phase CT images. We introduce a 3D inter-phase lesion-aware attention mechanism to learn effective 3D lesion features that are used to estimate attention weights describing the inter-phase relations of the enhancement patterns. We also present a multi-scale attention scheme to capture and aggregate temporal patterns of lesion features at different spatial scales for further improvement. Extensive experiments on multi-phase CT scans of kidney cancer patients from the collected dataset demonstrate that our LACPANet outperforms state-of-the-art approaches in diagnostic accuracy.

Paper Structure

This paper contains 18 sections, 7 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Sample of five subtypes of renal tumors in multi-phase CT scans. In each row, CT images of the corresponding subtype over multiple phases are shown. The tumor area is indicated by the red arrow in the delayed phase.
  • Figure 2: Overall framework of our LACPANet. Given multi-phase CT $\mathcal{I}$, 3D CNN-based segmentation network predicts a lesion segmentation map $\hat{s}_{i}$ for each phase $I_i$. Then, lesion-level feature embedding (bottom left) is performed to produce lesion representations for each phase, i.e., query $Q$, key $K$, and value $V$. These embeddings are fed to the cross-phase attention module (bottom right) to capture inter-phase dependencies of lesion features with the help of phase embedding $P$. Here, only the single-scale process of lesion-level feature embedding was described for brevity. Finally, the output feature of cross-phase attention $F_{out}$ is fed to FFN to obtain the cancer subtype prediction.
  • Figure 3: Multi-scale attention scheme. We use both low-level and high-level features produced by 3D CNNs to compute inter-phase attention. The multi-scale lesion-level embeddings are obtained with the 3D features and the segmentation maps at the corresponding scales. These feature embeddings are then added by learnable phase embeddings at multiple scales.
  • Figure 4: Illustration of the 3D multi-phase baseline network.
  • Figure 5: Visualization of low-level and high-level inter-phase attention matrices ($A^{low}, A^{high}$) extracted by our LACPANet for cancer subtype classification on test data. The first row shows the case of patients with oncocytoma. For this case, the baseline (3D multi-phase baseline) misclassifies the cancer subtype label as ccRCC, while our proposed LACPANet correctly classifies the subtype as oncocytoma. The second row shows the AML subtype case, and for this case, the baseline misclassifies the cancer label as chRCC, while our model makes the correct classification. The last row represents the results for the ccRCC case. The baseline wrongly predicts the cancer subtype as chRCC, while our method classifies the cancer subtype correctly from the input CT scan. In each row, input four-phase CT images, including non-contrast, arterial, portal, and delayed phases, are shown on the left, and the low- and high-level attention matrices obtained from our LACPANet are shown on the right. We show the representative slices of CT scans for visualization, and the tumor regions are zoomed-in (yellow) on each phase. The attention value for each query-key pair is displayed on the matrix.