Agent-based code generation for the Gammapy framework

Dmitriy Kostunin; Vladimir Sotnikov; Sergo Golovachev; Abhay Mehta; Tim Lukas Holch; Elisa Jones

Agent-based code generation for the Gammapy framework

Dmitriy Kostunin, Vladimir Sotnikov, Sergo Golovachev, Abhay Mehta, Tim Lukas Holch, Elisa Jones

TL;DR

The paper addresses the challenge of turning Large Language Model guidance into reproducible Gammapy analysis scripts for gamma-ray astronomy. It introduces an agent-based system that writes, executes, and self-repairs Python scripts inside a sandbox, governed by strong prompting contracts, iterative validation, and optional retrieval-augmented context. A modular architecture (configuration, prompting, runner, execution, RAG) plus a benchmarking harness demonstrates high pass rates on common Gammapy tasks and provides a minimal web UI. This approach enables reliable, auditable, and deployable AI-assisted analysis for DL3 workflows and supports open-weight backends for privacy and reproducibility across ecosystems.

Abstract

Software code generation using Large Language Models (LLMs) is one of the most successful applications of modern artificial intelligence. Foundational models are very effective for popular frameworks that benefit from documentation, examples, and strong community support. In contrast, specialized scientific libraries often lack these resources and may expose unstable APIs under active development, making it difficult for models trained on limited or outdated data. We address these issues for the Gammapy library by developing an agent capable of writing, executing, and validating code in a controlled environment. We present a minimal web demo and an accompanying benchmarking suite. This contribution summarizes the design, reports our current status, and outlines next steps.

Agent-based code generation for the Gammapy framework

TL;DR

Abstract

Agent-based code generation for the Gammapy framework

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)