AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases

Mahesh Natarajan; Xiaoye Li; Weiqun Zhang

AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases

Mahesh Natarajan, Xiaoye Li, Weiqun Zhang

Abstract

We present AstraAI, a command-line interface (CLI) coding framework for high-performance computing (HPC) software development. AstraAI operates directly within a Linux terminal and integrates large language models (LLMs) with Retrieval-Augmented Generation (RAG) and Abstract Syntax Tree (AST)-based structural analysis to enable context-aware code generation for complex scientific codebases. The central idea is to construct a high-fidelity prompt that is passed to the LLM for inference. This prompt augments the user request with relevant code snippets retrieved from the underlying framework codebase via RAG and structural context extracted from AST analysis, providing the model with precise information about relevant functions, data structures, and overall code organization. The framework is designed to perform scoped modifications to source code while preserving structural consistency with the surrounding code. AstraAI supports both locally hosted models from Hugging Face and API-based frontier models accessible via the American Science Cloud, enabling flexible deployment across HPC environments. The system generates code that aligns with existing project structures and programming patterns. We demonstrate AstraAI on representative HPC code generation tasks within AMReX, a DOE-supported HPC software infrastructure for exascale applications.

AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases

Abstract

AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases

Abstract

Paper Structure

Table of Contents

Figures (5)