Table of Contents
Fetching ...

PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs

Jinyue Li, Yuci Liang, Qiankun Li, Xinheng Lyu, Jiayu Qian, Huabao Chen, Kun Wang, Zhigang Zeng, Anil Anthony Bharath, Yang Liu

TL;DR

Inspired by the hierarchical memory process of human pathologists, PathMem is proposed, a memory-centric multimodal framework for pathology MLLMs that organizes structured pathology knowledge as a long-term memory and introduces a Memory Transformer that models the dynamic transition from LTM to working memory through multimodal memory activation and context-aware knowledge grounding, enabling context-aware memory refinement for downstream reasoning.

Abstract

Computational pathology demands both visual pattern recognition and dynamic integration of structured domain knowledge, including taxonomy, grading criteria, and clinical evidence. In practice, diagnostic reasoning requires linking morphological evidence with formal diagnostic and grading criteria. Although multimodal large language models (MLLMs) demonstrate strong vision language reasoning capabilities, they lack explicit mechanisms for structured knowledge integration and interpretable memory control. As a result, existing models struggle to consistently incorporate pathology-specific diagnostic standards during reasoning. Inspired by the hierarchical memory process of human pathologists, we propose PathMem, a memory-centric multimodal framework for pathology MLLMs. PathMem organizes structured pathology knowledge as a long-term memory (LTM) and introduces a Memory Transformer that models the dynamic transition from LTM to working memory (WM) through multimodal memory activation and context-aware knowledge grounding, enabling context-aware memory refinement for downstream reasoning. PathMem achieves SOTA performance across benchmarks, improving WSI-Bench report generation (12.8% WSI-Precision, 10.1% WSI-Relevance) and open-ended diagnosis by 9.7% and 8.9% over prior WSI-based models.

PathMem: Toward Cognition-Aligned Memory Transformation for Pathology MLLMs

TL;DR

Inspired by the hierarchical memory process of human pathologists, PathMem is proposed, a memory-centric multimodal framework for pathology MLLMs that organizes structured pathology knowledge as a long-term memory and introduces a Memory Transformer that models the dynamic transition from LTM to working memory through multimodal memory activation and context-aware knowledge grounding, enabling context-aware memory refinement for downstream reasoning.

Abstract

Computational pathology demands both visual pattern recognition and dynamic integration of structured domain knowledge, including taxonomy, grading criteria, and clinical evidence. In practice, diagnostic reasoning requires linking morphological evidence with formal diagnostic and grading criteria. Although multimodal large language models (MLLMs) demonstrate strong vision language reasoning capabilities, they lack explicit mechanisms for structured knowledge integration and interpretable memory control. As a result, existing models struggle to consistently incorporate pathology-specific diagnostic standards during reasoning. Inspired by the hierarchical memory process of human pathologists, we propose PathMem, a memory-centric multimodal framework for pathology MLLMs. PathMem organizes structured pathology knowledge as a long-term memory (LTM) and introduces a Memory Transformer that models the dynamic transition from LTM to working memory (WM) through multimodal memory activation and context-aware knowledge grounding, enabling context-aware memory refinement for downstream reasoning. PathMem achieves SOTA performance across benchmarks, improving WSI-Bench report generation (12.8% WSI-Precision, 10.1% WSI-Relevance) and open-ended diagnosis by 9.7% and 8.9% over prior WSI-based models.
Paper Structure (26 sections, 7 equations, 5 figures, 5 tables)

This paper contains 26 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Overview of PathMem. WM activates relevant LTM and transforms them into an updated WM for interpretable reasoning.
  • Figure 2: LTM construction pipeline for pathology knowledge graphs via iterative literature retrieval and LLM-based information extraction.
  • Figure 3: Framework of PathMem. A memory-augmented MLLMs for computational pathology that aligns visual, textual, and knowledge graph representations, and adaptively activates LTM for knowledge-grounded reasoning about pathology.
  • Figure 4: Qualitative comparison of generated reports by our method and three baseline approaches on the report generation task. (Red highlights denote incorrect content, while green highlights denote correct content, and orange highlights denote missing content from T-answer.)
  • Figure 5: Comparison of WSI-based and NLU-based evaluations. Green indicates agreement with the reference, red denotes deviations, orange marks missing ground-truth content, underlined text reflects template-style language, and blue highlights knowledge graph–grounded concepts.