HOMIE: Histopathology Omni-modal Embedding for Pathology Composed Retrieval
Qifeng Zhou, Wenliang Zhong, Thao M. Dang, Hehuan Ma, Saiyang Na, Yuzhi Guo, Junzhou Huang
TL;DR
This work defines Pathology Composed Retrieval (PCR) as retrieving evidence from omni-modal clinical data using interleaved queries, addressing the limitations of dual-encoder, low-resolution pathology models and the absence of a suitable benchmark. It introduces HOMIE, a two-stage adaptation framework that first tailors a multimodal LLM for retrieval via text-only pre-training with LoRA, then performs pathology-specific tuning with native-resolution inputs, stain augmentation, and a progressive knowledge curriculum to bridge domain gaps, all trained on public data. A dedicated PCR Benchmark evaluates composed retrieval across multi-image, image-text, and video-text modalities, revealing that HOMIE achieves state-of-the-art performance on traditional retrieval tasks and significantly outperforms baselines on PCR tasks. The results demonstrate that a unified omni-modal embedding enables a transparent, evidence-grounded computational consult in pathology, with potential extensions to incorporate genomics and other omics data for more comprehensive clinical decision support.
Abstract
The integration of Artificial Intelligence (AI) into pathology faces a fundamental challenge: black-box predictive models lack transparency, while generative approaches risk clinical hallucination. A case-based retrieval paradigm offers a more interpretable alternative for clinical adoption. However, current SOTA models are constrained by dual-encoder architectures that cannot process the composed modality of real-world clinical queries. We formally define the task of Pathology Composed Retrieval (PCR). However, progress in this newly defined task is blocked by two critical challenges: (1) Multimodal Large Language Models (MLLMs) offer the necessary deep-fusion architecture but suffer from a critical Task Mismatch and Domain Mismatch. (2) No benchmark exists to evaluate such compositional queries. To solve these challenges, we propose HOMIE, a systematic framework that transforms a general MLLM into a specialized retrieval expert. HOMIE resolves the dual mismatch via a two-stage process: a retrieval-adaptation stage to solve the task mismatch, and a pathology-specific tuning stage, featuring a progressive knowledge curriculum, pathology specfic stain and native resolution processing, to solve the domain mismatch. We also introduce the PCR Benchmark, a benchmark designed to evaluate composed retrieval in pathology. Experiments show that HOMIE, trained only on public data, matches SOTA performance on traditional retrieval tasks and outperforms all baselines on the newly defined PCR task.
