Table of Contents
Fetching ...

Connecting Large Language Model Agent to High Performance Computing Resource

Heng Ma, Alexander Brace, Carlo Siebenschuh, Greg Pauloski, Ian Foster, Arvind Ramanathan

TL;DR

The paper addresses the challenge of enabling Large Language Model (LLM) agent workflows to harness high-performance computing (HPC) resources for computationally intensive scientific tasks. It introduces Parsl as a bridge between LangGraph/LangChain-based LLM tool calls and HPC execution, and evaluates two integration schemes: a Parsl tool node that parallelizes individual tool calls and a Parsl ensemble function that runs a reservoir of simulations. Through molecular dynamics (MD) workflows using OpenMM and data retrieval via PDB, the study demonstrates concurrent execution on both a local GPU workstation and Polaris/ALCF, highlighting performance, scalability, and queue-time considerations on HPC systems. The findings show that while Parsl enables efficient parallelism and reduces developer burden, HPC constraints and LLM tool-call limits require careful architectural choices, such as running expensive tasks as ensemble functions. This work provides a practical pathway for AI-driven scientific workflows to leverage HPC resources effectively, informing design decisions for task orchestration, tool function design, and future extensions like AI-assisted sampling and broader HPC platform support.

Abstract

The Large Language Model agent workflow enables the LLM to invoke tool functions to increase the performance on specific scientific domain questions. To tackle large scale of scientific research, it requires access to computing resource and parallel computing setup. In this work, we implemented Parsl to the LangChain/LangGraph tool call setup, to bridge the gap between the LLM agent to the computing resource. Two tool call implementations were set up and tested on both local workstation and HPC environment on Polaris/ALCF. The first implementation with Parsl-enabled LangChain tool node queues the tool functions concurrently to the Parsl workers for parallel execution. The second configuration is implemented by converting the tool functions into Parsl ensemble functions, and is more suitable for large task on super computer environment. The LLM agent workflow was prompted to run molecular dynamics simulations, with different protein structure and simulation conditions. These results showed the LLM agent tools were managed and executed concurrently by Parsl on the available computing resource.

Connecting Large Language Model Agent to High Performance Computing Resource

TL;DR

The paper addresses the challenge of enabling Large Language Model (LLM) agent workflows to harness high-performance computing (HPC) resources for computationally intensive scientific tasks. It introduces Parsl as a bridge between LangGraph/LangChain-based LLM tool calls and HPC execution, and evaluates two integration schemes: a Parsl tool node that parallelizes individual tool calls and a Parsl ensemble function that runs a reservoir of simulations. Through molecular dynamics (MD) workflows using OpenMM and data retrieval via PDB, the study demonstrates concurrent execution on both a local GPU workstation and Polaris/ALCF, highlighting performance, scalability, and queue-time considerations on HPC systems. The findings show that while Parsl enables efficient parallelism and reduces developer burden, HPC constraints and LLM tool-call limits require careful architectural choices, such as running expensive tasks as ensemble functions. This work provides a practical pathway for AI-driven scientific workflows to leverage HPC resources effectively, informing design decisions for task orchestration, tool function design, and future extensions like AI-assisted sampling and broader HPC platform support.

Abstract

The Large Language Model agent workflow enables the LLM to invoke tool functions to increase the performance on specific scientific domain questions. To tackle large scale of scientific research, it requires access to computing resource and parallel computing setup. In this work, we implemented Parsl to the LangChain/LangGraph tool call setup, to bridge the gap between the LLM agent to the computing resource. Two tool call implementations were set up and tested on both local workstation and HPC environment on Polaris/ALCF. The first implementation with Parsl-enabled LangChain tool node queues the tool functions concurrently to the Parsl workers for parallel execution. The second configuration is implemented by converting the tool functions into Parsl ensemble functions, and is more suitable for large task on super computer environment. The LLM agent workflow was prompted to run molecular dynamics simulations, with different protein structure and simulation conditions. These results showed the LLM agent tools were managed and executed concurrently by Parsl on the available computing resource.

Paper Structure

This paper contains 12 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: The LangGraph workflows, (A) workflow 1 and (B) workflow 2. The START node takes in the prompt from the user, and the END node presents the final workflow result. The workflow 1 consists of a single LLM agent with tool node. The workflow 2 is managed with a supervisor node, which decides the next acting agent, the research or the simulator.
  • Figure 2: Two Parsl tool call scheme for LangGraph. The figure demonstrates the LLM agent, tool node and the tool functions. The Parsl tool node (A) replaces the LangGraph tool node, which submits the tool functions to the parallel Parsl queue. The Parsl ensemble function (B) can be directly called by the LLM, and launches an ensemble of simulation runs with Parsl. The blue letters mark the Parsl implementation to the LangGraph tool call framework.
  • Figure 3: The Parsl process for the run 1 (A), 2 (B), 3 (C) and 4 (D). Both run launched 8 workers with GPUs. The Parsl logs were recorded every 5 seconds.
  • Figure 4: The timeline of Parsl simulation ensemble on Polaris, running 100 simulations of 2KKJ protein from LangGraph agent workflow. The number of workers represented the Parsl workers with available GPUs, and the number of tasks depicted the tasks in the Parsl queue.