Table of Contents
Fetching ...

Egent: An Autonomous Agent for Equivalent Width Measurement

Yuan-Sen Ting, Serat Mahmud Saad, Fan Liu, Yuting Shen

TL;DR

Egent presents an autonomous agent that combines classical multi-Voigt line fitting with LLM-based visual inspection to measure equivalent widths directly from raw flux spectra. The pipeline is self-contained, logs complete provenance for every fit, and relies on the LLM primarily for quality control and edge-case refinement rather than core calculations. Validation against a large expert catalog (C3PO) shows strong agreement once per-spectrum continuum offsets are accounted, with robust performance across SNR from ~50 to 250. The approach enables survey-scale EW measurements with transparent decision-making and multiple deployment options, including offline and web-based interfaces, while acknowledging cost and generalization limitations. Overall, Egent demonstrates a practical, reproducible path toward automated, high-precision line-by-line abundance analysis at scale.

Abstract

We present Egent, an autonomous agent that combines classical multi-Voigt profile fitting with large language model (LLM) visual inspection and iterative refinement. The fitting engine is built from scratch with minimal dependencies, creating an ecosystem where the LLM can reason about fits through function calls-adjusting wavelength windows, adding blend components, modifying continuum treatment, and flagging problematic cases. Egent operates directly on raw flux spectra without requiring pre-normalized continua. We validate against manual measurements from human experts using 18,615 lines from the C3PO program across 84 Magellan/MIKE spectra at SNR~50-250. We find per-spectrum systematic offsets between Egent and expert measurements, likely arising from differences in global continuum placement prior to manual fitting; after accounting for these offsets, the agreement is 5-7 milliangstrom. The LLM's primary role is quality control: it confirms good fits (~60-65% of lines are LLM-refined and accepted), flags problematic cases (~10-20%), and occasionally rescues edge cases where tool use improves fits. Agreement between GPT-5 and GPT-5-mini confirms reproducibility, with GPT-5-mini enabling low-cost analysis at ~200 lines per US dollar. Every fit stores complete Voigt parameters, continuum coefficients, and LLM reasoning chains, enabling exact reconstruction without re-running. Egent compresses what traditionally requires months of expert effort into days of automated analysis, enabling survey-scale EW measurement. We provide open-source code at https://github.com/tingyuansen/Egent, including a web interface for drag-and-drop analysis and a local LLM backend for fully offline operation on consumer hardware.

Egent: An Autonomous Agent for Equivalent Width Measurement

TL;DR

Egent presents an autonomous agent that combines classical multi-Voigt line fitting with LLM-based visual inspection to measure equivalent widths directly from raw flux spectra. The pipeline is self-contained, logs complete provenance for every fit, and relies on the LLM primarily for quality control and edge-case refinement rather than core calculations. Validation against a large expert catalog (C3PO) shows strong agreement once per-spectrum continuum offsets are accounted, with robust performance across SNR from ~50 to 250. The approach enables survey-scale EW measurements with transparent decision-making and multiple deployment options, including offline and web-based interfaces, while acknowledging cost and generalization limitations. Overall, Egent demonstrates a practical, reproducible path toward automated, high-precision line-by-line abundance analysis at scale.

Abstract

We present Egent, an autonomous agent that combines classical multi-Voigt profile fitting with large language model (LLM) visual inspection and iterative refinement. The fitting engine is built from scratch with minimal dependencies, creating an ecosystem where the LLM can reason about fits through function calls-adjusting wavelength windows, adding blend components, modifying continuum treatment, and flagging problematic cases. Egent operates directly on raw flux spectra without requiring pre-normalized continua. We validate against manual measurements from human experts using 18,615 lines from the C3PO program across 84 Magellan/MIKE spectra at SNR~50-250. We find per-spectrum systematic offsets between Egent and expert measurements, likely arising from differences in global continuum placement prior to manual fitting; after accounting for these offsets, the agreement is 5-7 milliangstrom. The LLM's primary role is quality control: it confirms good fits (~60-65% of lines are LLM-refined and accepted), flags problematic cases (~10-20%), and occasionally rescues edge cases where tool use improves fits. Agreement between GPT-5 and GPT-5-mini confirms reproducibility, with GPT-5-mini enabling low-cost analysis at ~200 lines per US dollar. Every fit stores complete Voigt parameters, continuum coefficients, and LLM reasoning chains, enabling exact reconstruction without re-running. Egent compresses what traditionally requires months of expert effort into days of automated analysis, enabling survey-scale EW measurement. We provide open-source code at https://github.com/tingyuansen/Egent, including a web interface for drag-and-drop analysis and a local LLM backend for fully offline operation on consumer hardware.

Paper Structure

This paper contains 28 sections, 8 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Raw Magellan/MIKE échelle spectrum illustrating challenges for automated EW measurement. Top panel: Multiple échelle orders (different colors) showing the characteristic blaze function---the instrumental response that modulates the observed flux with a curved envelope peaking near each order's center. Absorption lines appear as narrow dips superimposed on this curved background. Bottom panel: Single order zoom (5680--5720 Å) showing raw flux without continuum normalization. The underlying continuum slopes upward following the blaze function, demonstrating why local continuum fitting is required. Visible challenges include blended lines (multiple absorptions within $<$1 Å), weak lines near the noise level, and the varying continuum slope that traditional methods must remove before fitting.
  • Figure 2: Schematic overview of the Egent pipeline. Input spectra (shifted to stellar rest frame with empirical wavelength calibration) and line catalogs enter the direct multi-Voigt fitting stage. Quality metrics determine whether the fit is acceptable or requires LLM inspection. The LLM visual inspector examines diagnostic plots and can adjust the extraction window (Fig. \ref{['fig:llm_improvement']}), add peaks for blends (Fig. \ref{['fig:peak_identification']}), propose continuum regions based on visual reasoning (Fig. \ref{['fig:continuum_regions']}), or flag unreliable fits. The iteration loop continues until the LLM accepts the measurement. All outputs include complete provenance: Voigt parameters, continuum coefficients, and LLM reasoning chains.
  • Figure 3: Multi-Voigt fitting process demonstrated on Fe I 5242.49 Å (Gaia ID 55780840513067392, SNR $\sim$ 160). Top panel: The normalized spectrum (black) is fit simultaneously with 8 Voigt components (orange and green dashed lines indicate individual component centers; the target component is shown in green). The combined multi-Voigt model (red) reproduces the complex blend structure. Continuum normalization is performed iteratively alongside the Voigt fitting, allowing the model to adapt to local continuum curvature. Bottom panel: Normalized residuals (data minus model, divided by flux uncertainties) demonstrate fit quality. The shaded regions indicate $\pm 1\sigma$ (green) and $\pm 2\sigma$ (yellow); the RMS of 1.21$\sigma$ confirms a statistically acceptable fit. When the linear fit to residuals shows a slope $>$0.3 $\sigma$/Å, a red dashed trend line is overlaid to guide the LLM's attention to continuum issues. This diagnostic panel format is also used for LLM visual inspection, where the agent examines residual patterns to detect missed blends (W-shaped residuals), continuum issues (systematic slopes indicated by the trend line), or unreliable fits (excessive scatter near the target line).
  • Figure 4: Example of LLM-driven window and continuum adjustment for Na I 6154.23 Å. (a) Simple fit: The initial 6 Å window includes edge features (visible at 6156--6157 Å) that introduce systematic bias in the residuals. The yellow shaded region indicates where the LLM determined a narrower window would improve the fit. (b) Egent fit: After the agent narrowed the window to 3.9 Å and switched to a quadratic continuum, the fit focuses on the region containing the target lines. The residuals improve: systematic deviations outside the $\pm 1\sigma$ band (green shading) largely disappear. This case illustrates the agent's ability to reason about fitting strategy: by excluding edge features irrelevant to the target line and adjusting the continuum model, the measurement becomes more robust. The EW error decreased from 8.9 mÅ to 5.8 mÅ (35% improvement relative to the C3PO catalog value of 36.8 mÅ).
  • Figure 5: Example of LLM-driven peak identification for Ni I 5643.08 Å. (a) Direct fit: Nine Voigt components (blue shaded regions) were identified automatically from the line catalog. While these capture most absorption features, residuals at $\sim$5642.5 Å and $\sim$5645.8 Å show systematic deviations exceeding $2\sigma$, indicating missed features. (b) Egent fit: By visually inspecting the diagnostic plot, the LLM identified two small absorption features not in the catalog and explicitly specified their wavelengths (5642.45 and 5645.85 Å) for inclusion in the multi-Voigt fit. The resulting 11-component model (9 original in blue, 2 LLM-identified in red) achieves flatter residuals. This demonstrates the LLM's ability to function as a visual inspector: it examines the residual panel for patterns indicating unmodeled absorption and proposes specific wavelengths for additional Voigt components. The EW improved from 9.0 mÅ to 11.5 mÅ, reducing the error relative to the catalog value (13.8 mÅ) by 52%.
  • ...and 9 more figures