Table of Contents
Fetching ...

LAMMI-Pathology: A Tool-Centric Bottom-Up LVLM-Agent Framework for Molecularly Informed Medical Intelligence in Pathology

Haoyang Su, Shaoting Zhang, Xiaosong Wang

TL;DR

A trajectory-aware fine-tuning strategy is developed that aligns the planner's decision-making process with these multi-step reasoning trajectories, thereby enhancing inference robustness in pathology understanding and its adaptive use of the customized toolset.

Abstract

The emergence of tool-calling-based agent systems introduces a more evidence-driven paradigm for pathology image analysis in contrast to the coarse-grained text-image diagnostic approaches. With the recent large-scale experimental adoption of spatial transcriptomics technologies, molecularly validated pathological diagnosis is becoming increasingly open and accessible. In this work, we propose LAMMI-Pathology (LVLM-Agent System for Molecularly Informed Medical Intelligence in Pathology), a scalable agent framework for domain-specific agent tool-calling. LAMMI-Pathology adopts a tool-centric, bottom-up architecture in which customized domain-adaptive tools serve as the foundation. These tools are clustered by domain style to form component agents, which are then coordinated through a top-level planner hierarchically, avoiding excessively long context lengths that could induce task drift. Based on that, we introduce a novel trajectory construction mechanism based on Atomic Execution Nodes (AENs), which serve as reliable and composable units for building semi-simulated reasoning trajectories that capture credible agent-tool interactions. Building on this foundation, we develop a trajectory-aware fine-tuning strategy that aligns the planner's decision-making process with these multi-step reasoning trajectories, thereby enhancing inference robustness in pathology understanding and its adaptive use of the customized toolset.

LAMMI-Pathology: A Tool-Centric Bottom-Up LVLM-Agent Framework for Molecularly Informed Medical Intelligence in Pathology

TL;DR

A trajectory-aware fine-tuning strategy is developed that aligns the planner's decision-making process with these multi-step reasoning trajectories, thereby enhancing inference robustness in pathology understanding and its adaptive use of the customized toolset.

Abstract

The emergence of tool-calling-based agent systems introduces a more evidence-driven paradigm for pathology image analysis in contrast to the coarse-grained text-image diagnostic approaches. With the recent large-scale experimental adoption of spatial transcriptomics technologies, molecularly validated pathological diagnosis is becoming increasingly open and accessible. In this work, we propose LAMMI-Pathology (LVLM-Agent System for Molecularly Informed Medical Intelligence in Pathology), a scalable agent framework for domain-specific agent tool-calling. LAMMI-Pathology adopts a tool-centric, bottom-up architecture in which customized domain-adaptive tools serve as the foundation. These tools are clustered by domain style to form component agents, which are then coordinated through a top-level planner hierarchically, avoiding excessively long context lengths that could induce task drift. Based on that, we introduce a novel trajectory construction mechanism based on Atomic Execution Nodes (AENs), which serve as reliable and composable units for building semi-simulated reasoning trajectories that capture credible agent-tool interactions. Building on this foundation, we develop a trajectory-aware fine-tuning strategy that aligns the planner's decision-making process with these multi-step reasoning trajectories, thereby enhancing inference robustness in pathology understanding and its adaptive use of the customized toolset.
Paper Structure (63 sections, 13 equations, 46 figures, 7 tables, 3 algorithms)

This paper contains 63 sections, 13 equations, 46 figures, 7 tables, 3 algorithms.

Figures (46)

  • Figure 1: Overview of the LAMMI-Pathology framework. The top row illustrates the hierarchical reasoning architecture where component agents orchestrate style-specific tools and aggregate contextual evidence for the LAMMI planner. The middle row depicts the data and tool construction pipeline, where spatial transcriptomics literature paired with histopathology images are processed through LVLM retriever to extract QA pairs, and tools are bottom-up clustered according to their sequential co-occurrence frequency during AEN construction, with adaptive cluster numbers. The bottom row presents the AEN-driven semi-simulated trajectory generation and trajectory-aware fine-tuning methodology, with both the planner and component agents operating on shared fine-tuned model parameters.
  • Figure 2: Trajectory-aware Adapter architecture. The adapter is injected after the FFN in each Transformer decoder layer. Segment masks are dynamically generated from input sequences to identify Thought, Action, and Action Input segments. Three learnable per-channel scaling vectors are applied, producing a modulation term that is multiplicatively applied to the FFN output.
  • Figure 3: Memory consumption comparison between LAMMI framework and direct multi-agent approaches. The plot shows GPU memory usage differences between LAMMI and standard MAS.
  • Figure 4: Trajectory visualization of an open-ended research-oriented query showing a semantically coherent solution path that incorporates hallucination occurrence and evidential reasoning through planner-guided exploratory tool invocation by ImageAgent and GeneAgent.
  • Figure 5: Planner Agent prompt template.
  • ...and 41 more figures