Table of Contents
Fetching ...

EAA: Automating materials characterization with vision language model agents

Ming Du, Yanqi Luo, Srutarshi Banerjee, Michael Wojcik, Jelena Popovic, Mathew J. Cherukara

TL;DR

EAA is presented, a vision-language-model-driven agentic system designed to automate complex experimental microscopy workflows and provides a modern tool ecosystem with two-way compatibility for Model Context Protocol (MCP), allowing instrument-control tools to be consumed or served across applications.

Abstract

We present Experiment Automation Agents (EAA), a vision-language-model-driven agentic system designed to automate complex experimental microscopy workflows. EAA integrates multimodal reasoning, tool-augmented action, and optional long-term memory to support both autonomous procedures and interactive user-guided measurements. Built on a flexible task-manager architecture, the system enables workflows ranging from fully agent-driven automation to logic-defined routines that embed localized LLM queries. EAA further provides a modern tool ecosystem with two-way compatibility for Model Context Protocol (MCP), allowing instrument-control tools to be consumed or served across applications. We demonstrate EAA at an imaging beamline at the Advanced Photon Source, including automated zone plate focusing, natural language-described feature search, and interactive data acquisition. These results illustrate how vision-capable agents can enhance beamline efficiency, reduce operational burden, and lower the expertise barrier for users.

EAA: Automating materials characterization with vision language model agents

TL;DR

EAA is presented, a vision-language-model-driven agentic system designed to automate complex experimental microscopy workflows and provides a modern tool ecosystem with two-way compatibility for Model Context Protocol (MCP), allowing instrument-control tools to be consumed or served across applications.

Abstract

We present Experiment Automation Agents (EAA), a vision-language-model-driven agentic system designed to automate complex experimental microscopy workflows. EAA integrates multimodal reasoning, tool-augmented action, and optional long-term memory to support both autonomous procedures and interactive user-guided measurements. Built on a flexible task-manager architecture, the system enables workflows ranging from fully agent-driven automation to logic-defined routines that embed localized LLM queries. EAA further provides a modern tool ecosystem with two-way compatibility for Model Context Protocol (MCP), allowing instrument-control tools to be consumed or served across applications. We demonstrate EAA at an imaging beamline at the Advanced Photon Source, including automated zone plate focusing, natural language-described feature search, and interactive data acquisition. These results illustrate how vision-capable agents can enhance beamline efficiency, reduce operational burden, and lower the expertise barrier for users.
Paper Structure (19 sections, 7 figures)

This paper contains 19 sections, 7 figures.

Figures (7)

  • Figure 1: The main components of EAA and their interactions. The task manager contains the chat loop or workflow, creates and holds the agent object, and maintains the context. New messages coming from the user, auto-generated by the workflow logic, or responded by the agent are added to the context. When the LLM agent drives the workflow, it sends a termination signal when the task is considered finished; The task manager then requests human input, based on which it either continues the workflow with updated instructions, enters the purely interactive chat mode, or exits. When a message is sent to the agent, it is sent with the entire context so that the VLM has conversational memory. Optionally, the agent can have long-term memory by saving the embeddings of notable messages into a vector store and retrieving them for future conversations and sessions. Tools from EAA's tool library or provided through external MCP servers are registered to the agent's tool manager through the task manager. The agent generates the schemas for the tools and attach them with messages sent to the VLM, and handles the tool calls made by the VLM.
  • Figure 2: Three levels of LLM involvement in experiment automation tools. Examples are enumerated for each level.
  • Figure 3: Example workflow diagrams of EAA task managers. (a) The chat loop, which is a generic workflow used for interactive conversations and agent-driven workflows. (b) The workflow of an automated focusing task as an example. This workflow falls in the hybrid workflow category defined in Section \ref{['sec:task_manager']}, and involves programmed query with the agent about image overlap, logic-based registration and auto-reply composition, and call to a sub-task for feature tracking. In both diagrams, nodes with dashed borders are those involving agent interactions.
  • Figure 4: Trajectory of the focusing task. (a) 2D images acquired at all visited zone plate $z$-positions, used for estimating positional drift. The $z$-positions in mm are indicated at the top right corner of each image. The blue line in the image at $z = -198$ indicate the vertical line that the agent was instructed to take line scans across. As the focus of the zone plate improves, images gradually become sharper. (b) The line profiles collected across the reference feature throughout the process. From red to green, the zone plate $z$-position moves from the initial -200 mm towards the final -193.7 mm. The positions of the line plots are relative, meaning the center of the profiles are aligned at $x = 0$ in the figure. (c) The trajectory of the process in the space of line profile FWHM and $z$-position. The initial point at $z = -200$ mm is not shown because its FWHM is too large.
  • Figure 5: Trajectory and result of the feature search task. (a) The trajectory of the center of 2D image acquisitions in the sample space. Underlaid in the background is the SEM image of the sample (a test pattern), which is transformed to align with the trajectory plot. (b) The image returned by the last 2D scan tool call (the title and axis ticks are a part of the returned image), right before the agent concluded the process. The requested feature (Siemens star) is roughly centered in the FOV.
  • ...and 2 more figures