Table of Contents
Fetching ...

Towards autonomous quantum physics research using LLM agents with access to intelligent tools

Sören Arlt, Xuemei Gu, Mario Krenn

TL;DR

Problem: Automating the generation of novel research directions and their experimental realization remains a major challenge in science. Approach: AI-Mandel uses two coupled LLM-agent systems with access to literature and PyTheus to autonomously propose quantum-physics ideas and translate them into executable experiment configurations. Findings: The system generated 187 ideas, with 184 implemented at least once and 804 total implementations (739 successful), including seven highlighted concepts and two independent publishable papers. Significance: This work demonstrates a concrete path toward AI-driven scientific discovery, while outlining practical challenges and future directions toward more general, interpretable, and autonomous artificial scientists.

Abstract

Artificial intelligence (AI) is used in numerous fields of science, yet the initial research questions and targets are still almost always provided by human researchers. AI-generated creative ideas in science are rare and often vague, so that it remains a human task to execute them. Automating idea generation and implementation in one coherent system would significantly shift the role of humans in the scientific process. Here we present AI-Mandel, an LLM agent that can generate and implement ideas in quantum physics. AI-Mandel formulates ideas from the literature and uses a domain-specific AI tool to turn them into concrete experiment designs that can readily be implemented in laboratories. The generated ideas by AI-Mandel are often scientifically interesting - for two of them we have already written independent scientific follow-up papers. The ideas include new variations of quantum teleportation, primitives of quantum networks in indefinite causal orders, and new concepts of geometric phases based on closed loops of quantum information transfer. AI-Mandel is a prototypical demonstration of an AI physicist that can generate and implement concrete, actionable ideas. Building such a system is not only useful to accelerate science, but it also reveals concrete open challenges on the path to human-level artificial scientists.

Towards autonomous quantum physics research using LLM agents with access to intelligent tools

TL;DR

Problem: Automating the generation of novel research directions and their experimental realization remains a major challenge in science. Approach: AI-Mandel uses two coupled LLM-agent systems with access to literature and PyTheus to autonomously propose quantum-physics ideas and translate them into executable experiment configurations. Findings: The system generated 187 ideas, with 184 implemented at least once and 804 total implementations (739 successful), including seven highlighted concepts and two independent publishable papers. Significance: This work demonstrates a concrete path toward AI-driven scientific discovery, while outlining practical challenges and future directions toward more general, interpretable, and autonomous artificial scientists.

Abstract

Artificial intelligence (AI) is used in numerous fields of science, yet the initial research questions and targets are still almost always provided by human researchers. AI-generated creative ideas in science are rare and often vague, so that it remains a human task to execute them. Automating idea generation and implementation in one coherent system would significantly shift the role of humans in the scientific process. Here we present AI-Mandel, an LLM agent that can generate and implement ideas in quantum physics. AI-Mandel formulates ideas from the literature and uses a domain-specific AI tool to turn them into concrete experiment designs that can readily be implemented in laboratories. The generated ideas by AI-Mandel are often scientifically interesting - for two of them we have already written independent scientific follow-up papers. The ideas include new variations of quantum teleportation, primitives of quantum networks in indefinite causal orders, and new concepts of geometric phases based on closed loops of quantum information transfer. AI-Mandel is a prototypical demonstration of an AI physicist that can generate and implement concrete, actionable ideas. Building such a system is not only useful to accelerate science, but it also reveals concrete open challenges on the path to human-level artificial scientists.

Paper Structure

This paper contains 5 sections, 5 figures.

Figures (5)

  • Figure 1: Workflow of AI-Mandel. Agents propose ideas, query literature for overlap, and interface with PyTheus for implementation. Successful designs are evaluated by human experts and the top results are developed into a research project and published.
  • Figure 2: Detailed Overview of Agent Interactions. The system is split into Idea generation and Idea implementation. Idea-generation consists of four agents (Researcher, Novelty, Judge, and Mediator). The Researcher is prompted to come up with an interesting target to search for through a design tool for quantum optics experiments. The prompt of the Researcher contains (i) the abstracts of three random papers from the arXiv dataset of quantum physics articles, (ii) documentation and implementations of existing PyTheus design queries., (iii) additional information about limitations on type of experiments that can be searched for, (iv) a pair of quantum physics concepts that currently are not combined but predicted to have potential to be impactful by Impact4Cast with the request to combine them in the final idea. The Researcher gets to choose between two actions. Action 1: Formulate an arxiv query to search literature. The first three abstracts matching the query are added to the context. Action 2: Formulate three possible ideas and pick one of them, reasoning about novelty and feasibility. The idea is passed on to the Novelty agent, which is prompted to accept or reject suggestions based on novelty with respect to existing ideas generated by the agents (Idea Pool) or existing examples which have been implemented in PyTheus and published before. If the Novelty agent rejects the suggestion, it is tasked to give feedback to the Researcher, which will attempt to improve its suggestion until the Novelty agent accepts or a maximum number of iterations is reached. In the case of the Novelty agent accepting the suggestion, it is passed on to the Judge, which is tasked to accept or reject suggestions based on feasibility. In the case of rejection, this agent also gives feedback on flaws in the current suggestion to the Researcher. When an idea is accepted by both Novelty agent and Judge, we consider it to be successfully generated and it is stored in the Idea Pool. The Mediator agent is called every third iteration and is tasked to highlight inefficiencies in the conversation between the other three agents. Its prompt contains the prompts of all other agents and their conversation up to this point. For Idea implementation the Expert is prompted to write a working config file for the PyTheus Tool. Error messages are passed back to the Expert for debugging with the task to fix the implementation. Successful designs are stored and passed on to human domain experts.
  • Figure 3: Example Agent Interaction. The initial Researcher proposal is rejected by the Novelty Supervisor. After multiple iterations, the modified proposal is accepted by the Novelty Supervisor and the Judge. The initial attempt by the Expert does not succeed. Upon fixing the parameter settings, the Expert executes the tool successfully, and the finalized design is subsequently stored.
  • Figure 4: Analysis of Agent Interaction -- (a) Visualization of all idea generation conversations. 'Full Reject' are ideas that did not get past the Novelty Supervisor during multiple iterations. 'Novelty Accept' are ideas that were accepted by the Novelty supervisor, but did not pass the Judge during multiple iterations. 'Full Accept' are ideas that passed both filters. We label the 7 final ideas by numbers. (b) Showing successful (green) and unsuccessful (red) implementation attempts for each idea. Each rectangle segment is one run. To test the implementability of 'Full Reject' and 'Novelty Accept', the Researcher suggestions were passed to the Expert despite the rejections. (c)-(e) Histograms showing Occurrences of the idea generation agents (Researcher, Judge, Novelty Supervisor). x-axis: how often an agent appears, y-axis: number of runs for each bin. The color identifies 'Full Reject', 'Novelty Accept', or 'Full Accept'. (f) Histogram showing how often the Expert is called during an implementation run. Red: unsuccessful implementations. (g) Cumulative sum of new SciMuse concepts mentioned in the abstracts written by the Researcher. 'Main' is our main run of 'Full Accept' ideas with SciMuse and Idea Pool. (h) Expert success rate (reaching a successful PyTheus run) for different levels of success in idea-generation (i) 748 abstracts (from Main, No SciMuse, and No Idea Pool) are embedded using text-embedding-3-large (text embedding model by OpenAI) and approximated in a two dimensional space using UMAP and PCA. We label the 7 final ideas by numbers. We manually identified some clusters (especially prevalent in No Idea Pool and No SciMuse) and analyzed the contained abstracts, giving them a comprehensive title.
  • Figure 5: Human filtered ideas. We show the seven implementations generated by the agents that we considered most viable to be developed into a full research project. We give detailed descriptions of these ideas in the results section.