Normal and Abnormal Pathology Knowledge-Augmented Vision-Language Model for Anomaly Detection in Pathology Images
Jinsol Song, Jiamu Wang, Anh Tien Nguyen, Keunho Byeon, Sangjeong Ahn, Sung Hak Lee, Jin Tae Kwak
TL;DR
This work tackles anomaly detection in pathology images under data scarcity and variability by introducing Ano-NAViLa, a lightweight vision-language framework that leverages two term pools—normal and abnormal—together with a frozen VLM and a trainable MLP. It generates text-augmented image embeddings and uses a contrastive objective to cluster normal- and abnormal-associated representations, producing patch- and WSI-level anomaly scores via centroid-based deviations $A_{score} = D^{N}(\mathbf{h}^{N}) + D^{A}(\mathbf{h}^{A})$. Evaluations on GastricLN and Camelyon16 show state-of-the-art anomaly detection and localization, with strong generalization across organs and institutions, and interpretable image-text associations validated by pathologists. The approach achieves high accuracy with low trainable parameters and offers clinically meaningful textual explanations, supporting potential translation to real-world workflows. Future work includes automated term-pool construction, broader external validation, and VLM optimization for further efficiency.
Abstract
Anomaly detection in computational pathology aims to identify rare and scarce anomalies where disease-related data are often limited or missing. Existing anomaly detection methods, primarily designed for industrial settings, face limitations in pathology due to computational constraints, diverse tissue structures, and lack of interpretability. To address these challenges, we propose Ano-NAViLa, a Normal and Abnormal pathology knowledge-augmented Vision-Language model for Anomaly detection in pathology images. Ano-NAViLa is built on a pre-trained vision-language model with a lightweight trainable MLP. By incorporating both normal and abnormal pathology knowledge, Ano-NAViLa enhances accuracy and robustness to variability in pathology images and provides interpretability through image-text associations. Evaluated on two lymph node datasets from different organs, Ano-NAViLa achieves the state-of-the-art performance in anomaly detection and localization, outperforming competing models.
