LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image

Chetan Madan; Mayuna Gupta; Soumen Basu; Pankaj Gupta; Chetan Arora

LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image

Chetan Madan, Mayuna Gupta, Soumen Basu, Pankaj Gupta, Chetan Arora

TL;DR

This work tackles gallbladder cancer detection in ultrasound images, where noise and small pathology complicate localization. It introduces LQ-Adapter, a learnable-query extension of ViT-Adapter, to strengthen localization through content-aware queries integrated with a CNN-based spatial prior. LQ-Adapter achieves state-of-the-art mean IoU on the GBCU dataset and demonstrates competitive results with substantially fewer trainable parameters, while also generalizing to polyp detection in Kvasir-Seg, indicating cross-domain applicability. The results suggest that leveraging learnable queries on top of frozen foundation-model backbones can yield robust, data-efficient performance for medical imaging tasks without heavy architectural redesigns.

Abstract

We focus on the problem of Gallbladder Cancer (GBC) detection from Ultrasound (US) images. The problem presents unique challenges to modern Deep Neural Network (DNN) techniques due to low image quality arising from noise, textures, and viewpoint variations. Tackling such challenges would necessitate precise localization performance by the DNN to identify the discerning features for the downstream malignancy prediction. While several techniques have been proposed in the recent years for the problem, all of these methods employ complex custom architectures. Inspired by the success of foundational models for natural image tasks, along with the use of adapters to fine-tune such models for the custom tasks, we investigate the merit of one such design, ViT-Adapter, for the GBC detection problem. We observe that ViT-Adapter relies predominantly on a primitive CNN-based spatial prior module to inject the localization information via cross-attention, which is inefficient for our problem due to the small pathology sizes, and variability in their appearances due to non-regular structure of the malignancy. In response, we propose, LQ-Adapter, a modified Adapter design for ViT, which improves localization information by leveraging learnable content queries over the basic spatial prior module. Our method surpasses existing approaches, enhancing the mean IoU (mIoU) scores by 5.4%, 5.8%, and 2.7% over ViT-Adapters, DINO, and FocalNet-DINO, respectively on the US image-based GBC detection dataset, and establishing a new state-of-the-art (SOTA). Additionally, we validate the applicability and effectiveness of LQ-Adapter on the Kvasir-Seg dataset for polyp detection from colonoscopy images. Superior performance of our design on this problem as well showcases its capability to handle diverse medical imaging tasks across different datasets. Code is released at https://github.com/ChetanMadan/LQ-Adapter

LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image

TL;DR

Abstract

LQ-Adapter: ViT-Adapter with Learnable Queries for Gallbladder Cancer Detection from Ultrasound Image

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)