SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

Patrick Emami; Zhaonan Li; Saumya Sinha; Truc Nguyen

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

Patrick Emami, Zhaonan Li, Saumya Sinha, Truc Nguyen

TL;DR

SysCaps introduce text-based system attributes as interfaces to surrogate models for complex energy-system simulations. The authors build a lightweight multimodal model that fuses text embeddings from fine-tuned language models with time-series inputs, and they train SysCaps using LLMs to generate natural-language captions from simulator metadata, enabling test-time chat-style querying. Experiments on building-energy and wind-farm simulators show SysCaps-augmented surrogates outperform traditional one-hot baselines and exhibit robustness to caption length and attribute synonyms, while enabling language-driven design-space exploration; prompt augmentation can further regularize learning in small data regimes. This work suggests that language interfaces can enhance accessibility and generalization of SciML surrogates and points to future directions in cross-simulator generalization and user-centric evaluation.

Abstract

Surrogate models are used to predict the behavior of complex energy systems that are too expensive to simulate with traditional numerical methods. Our work introduces the use of language descriptions, which we call ``system captions'' or SysCaps, to interface with such surrogates. We argue that interacting with surrogates through text, particularly natural language, makes these models more accessible for both experts and non-experts. We introduce a lightweight multimodal text and timeseries regression model and a training pipeline that uses large language models (LLMs) to synthesize high-quality captions from simulation metadata. Our experiments on two real-world simulators of buildings and wind farms show that our SysCaps-augmented surrogates have better accuracy on held-out systems than traditional methods while enjoying new generalization abilities, such as handling semantically related descriptions of the same test system. Additional experiments also highlight the potential of SysCaps to unlock language-driven design space exploration and to regularize training through prompt augmentation.

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 13 figures, 7 tables)

This paper contains 19 sections, 3 equations, 13 figures, 7 tables.

Introduction
Related Work
Problem Statement
Synthesizing System Captions (SysCaps) with LLMs
Text and Timeseries Surrogate Model
Experiments
Evaluating caption quality
Accuracy On Held-Out Systems
Caption Generalization
Design Space Exploration Application Using Language
Prompt Augmentation: Wind Farm Wake
Conclusion
Additional Experiment Details
Metrics
Hyperparameters
...and 4 more sections

Figures (13)

Figure 1: Our pipeline for augmenting multimodal simulation surrogates with language interfaces using "system captions", or SysCaps. SysCaps are text descriptions of knowledge about the system being simulated. In our work, the SysCaps describe the system's characteristics, as found in simulation metadata files. During training (a), we create paired datasets of temporal simulator inputs with key-value template SysCaps or LLM-generated natural language SysCaps. At test time (b), we prompt the surrogate model with one or more key-value template captions or natural language captions. LLMs are only used to generate synthetic training data; we use a lightweight BERT-style text encoder and an efficient long-sequence encoder to keep the computational cost of our surrogate low.
Figure 2: Building blocks of our surrogate model, $f = h_\theta \circ g_\psi$, that includes a multimodal encoder, $g_\psi$, and a top model, $h_\theta$. The multimodal encoder, $g_\psi = g_{\psi}^{\mathsf{seq}} \circ g_{\psi}^{\mathsf{text}}$, is a composition of a text encoder, $g_{\psi}^{\mathsf{text}}$, and a bidirectional sequence encoder, $g_{\psi}^{\mathsf{seq}}$, for timeseries inputs. The text embedding vector $\hat{z}$ is broadcasted (dashed lines) to create a sequence that is concatenated with the timeseries input. This multimodal sequence is the input to the sequence encoder.
Figure 3: System captions unlock text-prompt-style surrogate modeling for complex systems. We show building stock daily load profiles aggregated for Warehouse building type, created with caption templates. From left to right, we use captions with one, three, and six attributes.
Figure 4: Design space exploration using language. a) We show that the model has learned a physically plausible relationship between building square footage (sqft) and number (#) of stories. b) Failure case: When tested on unseen values of sqft (blue crosses), the model's predictions appear to be physically implausible---the model underestimates the energy consumption at these sqft values.
Figure 5: A visualization of the sequential surrogate model baseline with one-hot encoded system attributes for the buildings experiment.
...and 8 more figures

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

TL;DR

Abstract

SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (13)