ZeroSlide: Is Zero-Shot Classification Adequate for Lifelong Learning in Whole-Slide Image Analysis in the Era of Pathology Vision-Language Foundation Models?

Doanh C. Bui; Hoai Luan Pham; Vu Trung Duong Le; Tuan Hai Vu; Van Duy Tran; Yasuhiko Nakashima

ZeroSlide: Is Zero-Shot Classification Adequate for Lifelong Learning in Whole-Slide Image Analysis in the Era of Pathology Vision-Language Foundation Models?

Doanh C. Bui, Hoai Luan Pham, Vu Trung Duong Le, Tuan Hai Vu, Van Duy Tran, Yasuhiko Nakashima

TL;DR

This work tackles lifelong learning for whole-slide image analysis by comparing training-based continual-learning methods against a zero-shot lifelong-learning approach, ZeroSlide, that leverages pathology vision-language foundations and class prompts. By tiling WSIs, extracting patch features with TITAN, and aggregating via a learnable or pretrained slide encoder, the study frames lifelong learning in both CLASS-IL and TASK-IL settings and constructs prototype-based class templates for zero-shot classification. Experimental results across six TCGA datasets show ZeroSlide is highly competitive with rehearsal-based methods, often matching or surpassing regularization approaches while offering training-free inference and no online storage needs; ConSlide remains strongest in CLASS-IL, but ZeroSlide demonstrates robust performance and stability in BWT and Forgetting metrics. The findings indicate zero-shot lifelong learning is a viable and practical alternative in pathology, with opportunities to further improve confidence and integrate class-template ideas with continual-learning strategies to maximize clinical utility.

Abstract

Lifelong learning for whole slide images (WSIs) poses the challenge of training a unified model to perform multiple WSI-related tasks, such as cancer subtyping and tumor classification, in a distributed, continual fashion. This is a practical and applicable problem in clinics and hospitals, as WSIs are large, require storage, processing, and transfer time. Training new models whenever new tasks are defined is time-consuming. Recent work has applied regularization- and rehearsal-based methods to this setting. However, the rise of vision-language foundation models that align diagnostic text with pathology images raises the question: are these models alone sufficient for lifelong WSI learning using zero-shot classification, or is further investigation into continual learning strategies needed to improve performance? To our knowledge, this is the first study to compare conventional continual-learning approaches with vision-language zero-shot classification for WSIs. Our source code and experimental results will be available soon.

ZeroSlide: Is Zero-Shot Classification Adequate for Lifelong Learning in Whole-Slide Image Analysis in the Era of Pathology Vision-Language Foundation Models?

TL;DR

Abstract

ZeroSlide: Is Zero-Shot Classification Adequate for Lifelong Learning in Whole-Slide Image Analysis in the Era of Pathology Vision-Language Foundation Models?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)