Table of Contents
Fetching ...

AeroTherm-GPT: A Verification-Centered LLM Framework for Thermal Protection System Engineering Workflows

Chuhan Qiao, Jinglai Zheng, Jie Huang, Buyue Zhao, Fan Li, Haiming Huang

Abstract

Integrating Large Language Models (LLMs) into hypersonic thermal protection system (TPS) design is bottlenecked by cascading constraint violations when generating executable simulation artifacts. General-purpose LLMs, treating generation as single-pass text completion, fail to satisfy the sequential, multi-gate constraints inherent in safety-critical engineering workflows. To address this, we propose AeroTherm-GPT, the first TPS-specialized LLM Agent, instantiated through a Constraint-Closed-Loop Generation (CCLG) framework. CCLG organizes TPS artifact generation as an iterative workflow comprising generation, validation, CDG-guided repair, execution, and audit. The Constraint Dependency Graph (CDG) encodes empirical co-resolution structure among constraint categories, directing repair toward upstream fault candidates based on lifecycle ordering priors and empirical co-resolution probabilities. This upstream-priority mechanism resolves multiple downstream violations per action, achieving a Root-Cause Fix Efficiency of 4.16 versus 1.76 for flat-checklist repair. Evaluated on HyTPS-Bench and validated against external benchmarks, AeroTherm-GPT achieves 88.7% End-to-End Success Rate (95% CI: 87.5-89.9), a gain of +12.5 pp over the matched non-CDG ablation baseline, without catastrophic forgetting on scientific reasoning and code generation tasks.

AeroTherm-GPT: A Verification-Centered LLM Framework for Thermal Protection System Engineering Workflows

Abstract

Integrating Large Language Models (LLMs) into hypersonic thermal protection system (TPS) design is bottlenecked by cascading constraint violations when generating executable simulation artifacts. General-purpose LLMs, treating generation as single-pass text completion, fail to satisfy the sequential, multi-gate constraints inherent in safety-critical engineering workflows. To address this, we propose AeroTherm-GPT, the first TPS-specialized LLM Agent, instantiated through a Constraint-Closed-Loop Generation (CCLG) framework. CCLG organizes TPS artifact generation as an iterative workflow comprising generation, validation, CDG-guided repair, execution, and audit. The Constraint Dependency Graph (CDG) encodes empirical co-resolution structure among constraint categories, directing repair toward upstream fault candidates based on lifecycle ordering priors and empirical co-resolution probabilities. This upstream-priority mechanism resolves multiple downstream violations per action, achieving a Root-Cause Fix Efficiency of 4.16 versus 1.76 for flat-checklist repair. Evaluated on HyTPS-Bench and validated against external benchmarks, AeroTherm-GPT achieves 88.7% End-to-End Success Rate (95% CI: 87.5-89.9), a gain of +12.5 pp over the matched non-CDG ablation baseline, without catastrophic forgetting on scientific reasoning and code generation tasks.

Paper Structure

This paper contains 75 sections, 5 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: Overview of the proposed approach. An engineering requirement enters the CCLG workflow (generation $\to$ validation $\to$ CDG-guided repair $\to$ execution $\to$ audit), producing an executable and traceable simulation artifact. AeroTherm-GPT is a TPS-oriented instantiation of this framework; HyTPS-Bench provides the workflow-aligned evaluation setting.
  • Figure 2: End-to-end architecture of AeroTherm-GPT. Solid-line modules (constraint-aware RAG, TPS-specific SFT, CDG-aware VER loop, constraint-aware DPO, reference executor) are fully implemented and evaluated in all experiments. Dashed-line extensions (probabilistic constraint bounds, multi-fidelity CDG, edge-deployed variants) represent directions for future work and are not evaluated here.
  • Figure 3: Constraint Dependency Graph (CDG) structure and regime-conditioned calibration. Top-left (Zone A): five-tier CDG with 23 nodes organized by lifecycle ordering (Unit $\rightarrow$ Physics $\rightarrow$ Numerical $\rightarrow$ Execution $\rightarrow$ Audit); directed edges represent empirical co-resolution probabilities calibrated from 4,206 repair episodes. Top-right: regime-conditioned adjacency matrix $W(c)$ across four thermal contexts (nominal, moderate, high-flux, extreme), with edge weights stratified by flight regime; mean absolute calibration error $= 0.08$ on 20% held-out split ($n = 841$). Bottom: per-regime CDG visualizations ($c_1$--$c_4$) illustrating how edge weight distributions shift across operating conditions.
  • Figure 4: Core computational mechanisms of AeroTherm-GPT. Zone B (top): PRM-guided constraint tree search (CDG priority, $K=4$ candidates scored by $V_\theta(s)$), terminating at $\text{Ready}(s)=1$ or budget exhaustion; right panel compares repair strategy trajectories. Zone C (middle): Transformer backbone with RAG cross-attention, frozen encoder, PRM head ($V_\theta(s)\in[0,1]$), and LoRA adaptation ($r=32, \alpha=64$). Zone D (bottom): Three-tier DPO preference construction (binary, self-evolution, severity-graded) with implicit reward distribution.
  • Figure 5: Ablation Study --- Incremental EESR Contribution per Module with Violation Rate Trajectories. The waterfall chart decomposes the EESR gain from each framework component, while the overlaid lines track the corresponding reduction in unit, physics, and numerical violation rates.
  • ...and 7 more figures