Table of Contents
Fetching ...

Agentic Large Language Models for Conceptual Systems Engineering and Design

Soheyl Massoudi, Mark Fuge

TL;DR

This work investigates whether a structured, memory-augmented multi-agent system (MAS) of LLMs can better manage the open-ended, iterative nature of early-stage engineering design than a simple two-agent loop. It introduces the Design-State Graph (DSG), a JSON-serializable graph that binds requirements, embodiments, and executable Python physics models, and evaluates a nine-role MAS versus a two-agent system across a solar-powered water filtration case. Across 60 experiments with two LLMs and varying temperatures, the MAS yields more granular design graphs and improved workflow completion, but overall requirement fidelity and production-ready simulation fidelity remain limited. The study highlights both the promise of agentic LLM orchestration for structured design and the need for unit-aware verification, explicit physics checks, and self-improving prompt/code loops to reach end-to-end automation in engineering design.

Abstract

Early-stage engineering design involves complex, iterative reasoning, yet existing large language model (LLM) workflows struggle to maintain task continuity and generate executable models. We evaluate whether a structured multi-agent system (MAS) can more effectively manage requirements extraction, functional decomposition, and simulator code generation than a simpler two-agent system (2AS). The target application is a solar-powered water filtration system as described in a cahier des charges. We introduce the Design-State Graph (DSG), a JSON-serializable representation that bundles requirements, physical embodiments, and Python-based physics models into graph nodes. A nine-role MAS iteratively builds and refines the DSG, while the 2AS collapses the process to a Generator-Reflector loop. Both systems run a total of 60 experiments (2 LLMs - Llama 3.3 70B vs reasoning-distilled DeepSeek R1 70B x 2 agent configurations x 3 temperatures x 5 seeds). We report a JSON validity, requirement coverage, embodiment presence, code compatibility, workflow completion, runtime, and graph size. Across all runs, both MAS and 2AS maintained perfect JSON integrity and embodiment tagging. Requirement coverage remained minimal (less than 20%). Code compatibility peaked at 100% under specific 2AS settings but averaged below 50% for MAS. Only the reasoning-distilled model reliably flagged workflow completion. Powered by DeepSeek R1 70B, the MAS generated more granular DSGs (average 5-6 nodes) whereas 2AS mode-collapsed. Structured multi-agent orchestration enhanced design detail. Reasoning-distilled LLM improved completion rates, yet low requirements and fidelity gaps in coding persisted.

Agentic Large Language Models for Conceptual Systems Engineering and Design

TL;DR

This work investigates whether a structured, memory-augmented multi-agent system (MAS) of LLMs can better manage the open-ended, iterative nature of early-stage engineering design than a simple two-agent loop. It introduces the Design-State Graph (DSG), a JSON-serializable graph that binds requirements, embodiments, and executable Python physics models, and evaluates a nine-role MAS versus a two-agent system across a solar-powered water filtration case. Across 60 experiments with two LLMs and varying temperatures, the MAS yields more granular design graphs and improved workflow completion, but overall requirement fidelity and production-ready simulation fidelity remain limited. The study highlights both the promise of agentic LLM orchestration for structured design and the need for unit-aware verification, explicit physics checks, and self-improving prompt/code loops to reach end-to-end automation in engineering design.

Abstract

Early-stage engineering design involves complex, iterative reasoning, yet existing large language model (LLM) workflows struggle to maintain task continuity and generate executable models. We evaluate whether a structured multi-agent system (MAS) can more effectively manage requirements extraction, functional decomposition, and simulator code generation than a simpler two-agent system (2AS). The target application is a solar-powered water filtration system as described in a cahier des charges. We introduce the Design-State Graph (DSG), a JSON-serializable representation that bundles requirements, physical embodiments, and Python-based physics models into graph nodes. A nine-role MAS iteratively builds and refines the DSG, while the 2AS collapses the process to a Generator-Reflector loop. Both systems run a total of 60 experiments (2 LLMs - Llama 3.3 70B vs reasoning-distilled DeepSeek R1 70B x 2 agent configurations x 3 temperatures x 5 seeds). We report a JSON validity, requirement coverage, embodiment presence, code compatibility, workflow completion, runtime, and graph size. Across all runs, both MAS and 2AS maintained perfect JSON integrity and embodiment tagging. Requirement coverage remained minimal (less than 20%). Code compatibility peaked at 100% under specific 2AS settings but averaged below 50% for MAS. Only the reasoning-distilled model reliably flagged workflow completion. Powered by DeepSeek R1 70B, the MAS generated more granular DSGs (average 5-6 nodes) whereas 2AS mode-collapsed. Structured multi-agent orchestration enhanced design detail. Reasoning-distilled LLM improved completion rates, yet low requirements and fidelity gaps in coding persisted.

Paper Structure

This paper contains 49 sections, 1 equation, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The four foundational pillars of an LLM agent—planning, memory, tools, and action. The LLM serves as the core reasoning engine, extended with capabilities for structured decision-making, external tool usage, memory retention, and iterative refinement through planning mechanisms such as reflection and self-criticism. (Concept adapted from weng2023prompt.)
  • Figure 2: Hierarchical DSG data model (dashed containment) and a runtime graph edge (solid line). Each DesignNode aggregates exactly one Embodiment and zero or more PhysicsModels and carries requirement traceability via linked_reqs. The DesignState stores nodes and a flat edges list as List[List[str]].
  • Figure 3: Comparison of the multi‐agent system and two‐agent system.
  • Figure 4: Comparison of the two best DSGs under MAS and 2AS settings.