Table of Contents
Fetching ...

Mephisto: Self-Improving Large Language Model-Based Agents for Automated Interpretation of Multi-band Galaxy Observations

Zechang Sun, Yuan-Sen Ting, Yaobo Liang, Nan Duan, Song Huang, Zheng Cai

TL;DR

Mephisto introduces a large language model–driven multi‑agent framework that emulates human scientific reasoning to interpret multi‑band galaxy observations through SED modeling with CIGALE. By combining tree‑search exploration, temporal memory, and a distillable external knowledge base, it achieves near‑grid‑search accuracy using only ~1% of the models and demonstrates robust performance on both COSMOS2020 galaxies and JWST‑identified Little Red Dots, including frontier cases discovered after typical LLM knowledge cutoffs. Ablation studies show memory and knowledge components substantially improve fit quality and transfer across galaxies, while cross‑LLM evaluations indicate that cost‑effective backbones can be viable when guided by prior distilled knowledge. The work highlights a path toward transparent, AI‑augmented astronomical workflows capable of scaling to billions of sources, albeit with current limitations in autonomy, model scope, and computational cost that invite future improvements in both models and data infrastructure.

Abstract

Astronomical research has long relied on human expertise to interpret complex data and formulate scientific hypotheses. In this study, we introduce Mephisto -- a multi-agent collaboration framework powered by large language models (LLMs) that emulates human-like reasoning for analyzing multi-band galaxy observations. Mephisto interfaces with the CIGALE codebase (a library of spectral energy distribution, SED, models) to iteratively refine physical models against observational data. It conducts deliberate reasoning via tree search, accumulates knowledge through self-play, and dynamically updates its knowledge base. Validated across diverse galaxy populations -- including the James Webb Space Telescope's recently discovered "Little Red Dot" galaxies -- we show that Mephisto demonstrates proficiency in inferring the physical properties of galaxies from multi-band photometry, positioning it as a promising research copilot for astronomers. Unlike prior black-box machine learning approaches in astronomy, Mephisto offers a transparent, human-aligned reasoning process that integrates seamlessly with existing research practices. This work underscores the possibility of LLM-driven agent-based research for astronomy, establishes a foundation for fully automated, end-to-end artificial intelligence (AI)-powered scientific workflows, and unlocks new avenues for AI-augmented discoveries in astronomy.

Mephisto: Self-Improving Large Language Model-Based Agents for Automated Interpretation of Multi-band Galaxy Observations

TL;DR

Mephisto introduces a large language model–driven multi‑agent framework that emulates human scientific reasoning to interpret multi‑band galaxy observations through SED modeling with CIGALE. By combining tree‑search exploration, temporal memory, and a distillable external knowledge base, it achieves near‑grid‑search accuracy using only ~1% of the models and demonstrates robust performance on both COSMOS2020 galaxies and JWST‑identified Little Red Dots, including frontier cases discovered after typical LLM knowledge cutoffs. Ablation studies show memory and knowledge components substantially improve fit quality and transfer across galaxies, while cross‑LLM evaluations indicate that cost‑effective backbones can be viable when guided by prior distilled knowledge. The work highlights a path toward transparent, AI‑augmented astronomical workflows capable of scaling to billions of sources, albeit with current limitations in autonomy, model scope, and computational cost that invite future improvements in both models and data infrastructure.

Abstract

Astronomical research has long relied on human expertise to interpret complex data and formulate scientific hypotheses. In this study, we introduce Mephisto -- a multi-agent collaboration framework powered by large language models (LLMs) that emulates human-like reasoning for analyzing multi-band galaxy observations. Mephisto interfaces with the CIGALE codebase (a library of spectral energy distribution, SED, models) to iteratively refine physical models against observational data. It conducts deliberate reasoning via tree search, accumulates knowledge through self-play, and dynamically updates its knowledge base. Validated across diverse galaxy populations -- including the James Webb Space Telescope's recently discovered "Little Red Dot" galaxies -- we show that Mephisto demonstrates proficiency in inferring the physical properties of galaxies from multi-band photometry, positioning it as a promising research copilot for astronomers. Unlike prior black-box machine learning approaches in astronomy, Mephisto offers a transparent, human-aligned reasoning process that integrates seamlessly with existing research practices. This work underscores the possibility of LLM-driven agent-based research for astronomy, establishes a foundation for fully automated, end-to-end artificial intelligence (AI)-powered scientific workflows, and unlocks new avenues for AI-augmented discoveries in astronomy.

Paper Structure

This paper contains 22 sections, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Overview of Mephisto's reasoning and learning process for interpreting multi-band galaxy observations. The central diagram shows Mephisto's tree-based exploration of different SED models, starting from an initial state (Input State) that includes observational data, the current SED model, and fitness metrics. The Reasoning Process (right) follows a structured approach: (a) using prompt templates that incorporate CIGALE documentation to propose model refinements, (b) instructing CIGALE with the proposed models, and (c) evaluating and prioritizing different states. This process is enhanced by two learning components: Knowledge Learning (top left) continuously extracts and validates insights from fitting experiences through knowledge distillation and validation (shown with a concrete example for AGN parameter adjustment), while Temporal Memory (bottom right) tracks the impact of previous model modifications. The entire analysis culminates in a scientific report through the Summarization component (bottom left), synthesizing the complete reasoning chain and conclusions.
  • Figure 2: Mephisto's analysis of a special class of outliers known as Little Red Dots, recently discovered by the James Webb Space Telescope (observational data: JADES ID 90354 from PABLO2024). Since these sources were only discovered in 2024, likely beyond most LLM knowledge cutoff dates, and their physical nature remains debated, they serve as a valuable test case demonstrating how Mephisto can navigate and identify solutions for objects through reasoning about physical processes not well-represented in LLM training data. Given only multi-band photometry and a base SED model, Mephisto iteratively explored and refined the physical model, developing explanations that more closely align with the observed properties of the galaxy. Throughout this exploration, Mephisto not only enriched the space of potential hypotheses for the current observations, but also validated the robustness of scientific conclusions across different model selections. Rather than merely providing parameter estimates, Mephisto produces a scientific report distilled from its reasoning process, encoding information about possible interpretations of the observations and the consistency between different models.
  • Figure 3: Examples of spectral energy distributions (SEDs) from COSMOS2020 galaxies alongside the best model solutions generated through Mephisto's chain of reasoning. We showcase three diverse galaxies to demonstrate Mephisto's capability to find solutions across a wide range of typical galaxy types: (a) A dusty star-forming galaxy with high attenuation ($\mathrm{A}_\mathrm{V}=2.84$) and active star formation, (b) a dwarf galaxy with low stellar mass and minimal dust, and (c) a massive galaxy potentially hosting an Active Galactic Nucleus component. The inset images show composite photometric images from the https://www.legacysurvey.org/viewer. Different colored lines represent contributions from different physical components (stellar emission in orange dashed lines, dust emission in purple dash-dotted lines, nebular emission in gray dotted lines, and AGN contribution in blue dash-dot ted lines where applicable), which combine to form the best-fit model (solid black line). Red data points with error bars show the observed fluxes across different wavelengths. Key derived physical parameters are displayed in the upper left of each panel, including stellar mass, dust attenuation, and star formation rate.
  • Figure 4: Comparison of relative $\chi^2$ values for the 256 uniformly selected galaxies from the COSMOS2020 catalog, showing solutions found by Mephisto versus those obtained from exhaustive grid search models. The y-axis shows the fractional difference between Mephisto and exhaustive search $\chi^2$ values (($\chi^2_\text{Mephisto} - \chi^2_\text{Baseline}$)/$\chi^2_\text{Baseline}$), while the x-axis shows the normalized exhaustive search $\chi^2$ values relative to a initial SED model at the start of Mephisto's search ($\chi^2_\text{Baseline}/\chi^2_\text{Init}$). The exhaustive search (baseline model) represents our approach with 360 million grid points, while the basic fit refers to a simplified initial fit using standard templates without refinement. The blue dashed lines indicate the $\pm20$% boundary, demonstrating that Mephisto consistently identifies solutions with $\chi^2$ values within 20% of the exhaustive search, despite utilizing a grid size approximately 100 times smaller. Notably, many points fall below the zero line (orange), indicating cases where Mephisto produces better fits than the exhaustive search, while requiring fewer computational resources compared to traditional exhaustive searches.
  • Figure 5: Comparison of key physical parameter estimates between Mephisto and baseline grid search for 256 COSMOS2020 galaxies. The y-axis shows the difference between Mephisto and baseline estimates for the parameter in question, while the x-axis shows the baseline values. Color indicates the ratio of Mephisto's $\chi^2$ to the baseline model's $\chi^2$, with darker points representing superior fits by Mephisto. Points with error bars highlight cases where Mephisto finds better solutions ($\chi^2_{\text{Mephisto}}/\chi^2_{\text{Baseline}} < 1$). Contours illustrate the density distribution of solutions of the 256 galaxies. Left panel: Stellar mass estimates show strong consistency across the full mass range ($10^8$-$10^{12} {\rm M}_\odot$), with typical differences contained within $\pm$0.3 dex. Mephisto achieves equal or better constraints for most galaxies despite evaluating only $\sim$1% of the parameter space, with greatest improvements (darker points with error bars) often seen at the extremes of the mass distribution. Right panel: Dust attenuation ($A_V$) estimates reveal Mephisto's ability to navigate parameter degeneracies. Despite $A_V$ being difficult to constrain due to its entanglement with stellar population, AGN contribution, and star formation history, Mephisto provides comparable or improved precision across the full range of attenuation values. The systematic improvement for high-attenuation systems ($A_V > 1$) demonstrates Mephisto's effectiveness in handling the complex physical scenarios that characterize dusty galaxies.
  • ...and 8 more figures