GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

Saarthak Kapse; Pushpak Pati; Srikar Yellapragada; Srijan Das; Rajarsi R. Gupta; Joel Saltz; Dimitris Samaras; Prateek Prasanna

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

Saarthak Kapse, Pushpak Pati, Srikar Yellapragada, Srijan Das, Rajarsi R. Gupta, Joel Saltz, Dimitris Samaras, Prateek Prasanna

TL;DR

GECKO introduces a zero- or few-label pretraining framework for gigapixel histopathology slides that aligns WSI representations with a task-specific Concept Prior derived from interpretable pathology concepts. The model comprises two branches: a deep-encoding branch that aggregates patch features and a concept-encoding branch that aggregates concept priors, trained with a symmetric CLIP-style contrastive loss. This design yields accurate WSI-level embeddings while preserving interpretability through explicit concept activations, and it can seamlessly incorporate auxiliary modalities like transcriptomics when available. Across five tasks and multiple evaluation settings, GECKO achieves state-of-the-art or competitive performance, demonstrates strong generalization, and provides pathologist-friendly explanations of its predictions.

Abstract

Pretraining a Multiple Instance Learning (MIL) aggregator enables the derivation of Whole Slide Image (WSI)-level embeddings from patch-level representations without supervision. While recent multimodal MIL pretraining approaches leveraging auxiliary modalities have demonstrated performance gains over unimodal WSI pretraining, the acquisition of these additional modalities necessitates extensive clinical profiling. This requirement increases costs and limits scalability in existing WSI datasets lacking such paired modalities. To address this, we propose Gigapixel Vision-Concept Knowledge Contrastive pretraining (GECKO), which aligns WSIs with a Concept Prior derived from the available WSIs. First, we derive an inherently interpretable concept prior by computing the similarity between each WSI patch and textual descriptions of predefined pathology concepts. GECKO then employs a dual-branch MIL network: one branch aggregates patch embeddings into a WSI-level deep embedding, while the other aggregates the concept prior into a corresponding WSI-level concept embedding. Both aggregated embeddings are aligned using a contrastive objective, thereby pretraining the entire dual-branch MIL model. Moreover, when auxiliary modalities such as transcriptomics data are available, GECKO seamlessly integrates them. Across five diverse tasks, GECKO consistently outperforms prior unimodal and multimodal pretraining approaches while also delivering clinically meaningful interpretability that bridges the gap between computational models and pathology expertise. Code is made available at https://github.com/bmi-imaginelab/GECKO

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

TL;DR

Abstract

GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)