Table of Contents
Fetching ...

PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents

Xiangyu Yin, Chuqiao Shi, Yimo Han, Yi Jiang

TL;DR

This work develops the "Ptychographic Experiment and Analysis Robot"(PEAR), a framework that leverages large language models (LLMs) to automate data analysis in ptychography and demonstrates that PEAR's multi-agent design significantly improves the workflow success rate, even with smaller open-weight models such as LLaMA 3.1 8B.

Abstract

Ptychography is an advanced computational imaging technique in X-ray and electron microscopy. It has been widely adopted across scientific research fields, including physics, chemistry, biology, and materials science, as well as in industrial applications such as semiconductor characterization. In practice, obtaining high-quality ptychographic images requires simultaneous optimization of numerous experimental and algorithmic parameters. Traditionally, parameter selection often relies on trial and error, leading to low-throughput workflows and potential human bias. In this work, we develop the "Ptychographic Experiment and Analysis Robot" (PEAR), a framework that leverages large language models (LLMs) to automate data analysis in ptychography. To ensure high robustness and accuracy, PEAR employs multiple LLM agents for tasks including knowledge retrieval, code generation, parameter recommendation, and image reasoning. Our study demonstrates that PEAR's multi-agent design significantly improves the workflow success rate, even with smaller open-weight models such as LLaMA 3.1 8B. PEAR also supports various automation levels and is designed to work with customized local knowledge bases, ensuring flexibility and adaptability across different research environments.

PEAR: A Robust and Flexible Automation Framework for Ptychography Enabled by Multiple Large Language Model Agents

TL;DR

This work develops the "Ptychographic Experiment and Analysis Robot"(PEAR), a framework that leverages large language models (LLMs) to automate data analysis in ptychography and demonstrates that PEAR's multi-agent design significantly improves the workflow success rate, even with smaller open-weight models such as LLaMA 3.1 8B.

Abstract

Ptychography is an advanced computational imaging technique in X-ray and electron microscopy. It has been widely adopted across scientific research fields, including physics, chemistry, biology, and materials science, as well as in industrial applications such as semiconductor characterization. In practice, obtaining high-quality ptychographic images requires simultaneous optimization of numerous experimental and algorithmic parameters. Traditionally, parameter selection often relies on trial and error, leading to low-throughput workflows and potential human bias. In this work, we develop the "Ptychographic Experiment and Analysis Robot" (PEAR), a framework that leverages large language models (LLMs) to automate data analysis in ptychography. To ensure high robustness and accuracy, PEAR employs multiple LLM agents for tasks including knowledge retrieval, code generation, parameter recommendation, and image reasoning. Our study demonstrates that PEAR's multi-agent design significantly improves the workflow success rate, even with smaller open-weight models such as LLaMA 3.1 8B. PEAR also supports various automation levels and is designed to work with customized local knowledge bases, ensuring flexibility and adaptability across different research environments.

Paper Structure

This paper contains 11 sections, 5 figures.

Figures (5)

  • Figure 1: Supported knowledge files in PEAR. (Left) Markdown files are used for structured, text-based information such as parameter descriptions or step-by-step protocols. In PEAR, these files can be directly embedded into prompts and formatted using prompt engineering techniques. For example, parameter descriptions enable LLMs to understand the nuances of various ptychographic parameters, their acceptable ranges, and their impacts on the reconstruction process. The knowledge base also includes expert guidance on parameter selection and optimization strategies, allowing LLMs to make informed recommendations based on specific reconstruction results and conditions. JSON formatting ensures that model outputs are consistent and machine-readable, facilitating seamless integration with other components of the PEAR system. (Middle) PDF documents, including scientific papers, user manuals, and technical reports, provide richer, more detailed information. PEAR processes these documents using Retrieval Augmented Generation techniques, which involve parsing the content, chunking it into manageable sections, and encoding these chunks into a vector store using embeddings. When required, PEAR efficiently retrieves relevant information through hybrid keyword-similarity searches, ensuring that the most pertinent content informs its outputs. (Right) Images (PNG/JPG) are critical for representing ptychographic data and results. In PEAR, images are encoded in base64 format, making them compatible with most vision-language models. Few-shot learning techniques enable PEAR to understand and describe new images based on previously seen examples. This approach simulates prior interactions where the system describes various ptychographic images, providing a foundation for interpreting new, unseen data. To process and understand image content, PEAR incorporates Visual Language Models that analyze ptychographic images and generate descriptive outputs, which inform parameter recommendations and quality assessments.
  • Figure 2: Comparison of single-Agent vs. multi-Agent Design for Code Generation in PEAR. The single-agent workflow uses a single prompt to collect parameter values from users and generate a reconstruction script based on their inputs. The multi-agent workflow breaks the task into simpler sub-tasks, each handled by specialized agents. The Question Generation Agent formulates questions to gather necessary information from the user. The Parameter Collection Agent processes user responses and extracts relevant parameter values. The Parameter Validation Agent checks these values for consistency and compliance with domain-specific constraints. The Parameter Confirmation Agent interacts with the user to confirm the collected parameters before script generation. Finally, the Script Generation Agent uses the validated parameters to generate the complete reconstruction script, enhancing accuracy and reducing errors compared to the single-agent approach.
  • Figure 3: Levels of automation in PEAR. At level 0, human users maintain full control over the reconstruction process. They manually specify input parameters and customize the code according to their specific needs. PEAR then generates a reconstruction script based on these inputs. After reconstruction, users manually examine the results and decide on further steps or parameter adjustments. Even at this manual level, the system collects data on user inputs and decisions, which is valuable for refining future recommendations and enhancing automation capabilities. At level 1, users still control code generation, but the system provides intelligent suggestions based on the input data and experimental parameters. PEAR generates a reconstruction script that integrates both user inputs and AI-recommended parameters. After the reconstruction, PEAR offers recommendations for further parameter adjustments to improve quality based on user feedback. Users have the option to accept, modify, or reject these recommendations in subsequent iterations. At level 2, PEAR's diagnostic agents automatically assess the quality of the reconstruction, identifying potential issues such as artifacts or convergence problems. The AI provides a quality assessment that users can review, offering insights and feedback on the reconstruction quality and suggesting further actions or adjustments.
  • Figure 4: PEAR-assisted reconstructions of an electron ptychography dataset of SnSe. At automation Level 1, PEAR incorporates user feedback to recommend optimized parameters that enhance image quality. As a result, the reconstructed atomic structures become noticeably sharper and contain less noisy artifacts throughout the PEAR-guided workflow.
  • Figure 5: Prompts used in the computational experiment.