ChronoLLM: A Framework for Customizing Large Language Model for Digital Twins generalization based on PyChrono

Jingquan Wang; Harry Zhang; Khailanii Slaton; Shu Wang; Radu Serban; Jinlong Wu; Dan Negrut

ChronoLLM: A Framework for Customizing Large Language Model for Digital Twins generalization based on PyChrono

Jingquan Wang, Harry Zhang, Khailanii Slaton, Shu Wang, Radu Serban, Jinlong Wu, Dan Negrut

TL;DR

This work addresses the gap between general-purpose LLMs and domain-specific needs in PyChrono-based simulation development. By combining continual pretraining on Chrono-related data with supervised and parameter-efficient fine-tuning, ChronoLlama tailors dense LLMs for reliable PyChrono code generation and natural-language API interactions. Empirical results show that fine-tuned ChronoLlama models substantially outperform baselines in both code quality and execution reliability, enabling faster and more accurate digital-twin generation. The approach advances AI-assisted engineering by delivering scalable, domain-aware tooling for multi-physics simulations and sets a foundation for future multimodal and hierarchical agent-based enhancements.

Abstract

Recently, the integration of advanced simulation technologies with artificial intelligence (AI) is revolutionizing science and engineering research. ChronoLlama introduces a novel framework that customizes the open-source LLMs, specifically for code generation, paired with PyChrono for multi-physics simulations. This integration aims to automate and improve the creation of simulation scripts, thus enhancing model accuracy and efficiency. This combination harnesses the speed of AI-driven code generation with the reliability of physics-based simulations, providing a powerful tool for researchers and engineers. Empirical results indicate substantial enhancements in simulation setup speed, accuracy of the generated codes, and overall computational efficiency. ChronoLlama not only expedites the development and testing of multibody systems but also spearheads a scalable, AI-enhanced approach to managing intricate mechanical simulations. This pioneering integration of cutting-edge AI with traditional simulation platforms represents a significant leap forward in automating and optimizing design processes in engineering applications.

ChronoLLM: A Framework for Customizing Large Language Model for Digital Twins generalization based on PyChrono

TL;DR

Abstract

Paper Structure (31 sections, 4 equations, 10 figures, 5 tables)

This paper contains 31 sections, 4 equations, 10 figures, 5 tables.

Introduction
Project Chrono and Impact of Chrono
LLMs and Domain-Specific LLMs
PyChrono Challenges and current LLMs
The limitations of current LLMs for Chrono-related problems
Methodology
Problem Statement
Selection of Base Models
Continual Pretraining
Continual Pretraining Dataset
Fine-Tuning Methodology
Supervised Fine-Tuning (SFT)
Parameter-Efficient Fine-Tuning (PEFT)
LoRa and Its Variants
Advantages of PEFT Methods
...and 16 more sections

Figures (10)

Figure 1: The word cloud of Project Chrono Forum
Figure 2: The whole pipeline of ChronoLlama to customize open-source LLMs for PyChrono tasks.
Figure 3: Overview of test environments categorized into various domains used in simulations. Each row represents a specific category, highlighting different scenarios for testing and evaluation of simulation models.
Figure 4: Comparison of LLM performance across different models based on the average document reference score. The fine-tuned model (gpt-40-mini-f9-t0.1, marked in red) significantly outperforms all other models, achieving an average score close to 70. The baseline model (gpt-40-mini, indicated by the red dashed line) achieves an average score of 40, showing a substantial improvement after fine-tuning. Other models are marked in blue for reference. This demonstrates the effectiveness of fine-tuning in enhancing the model's performance.
Figure 5: Ranked performance of various LLMs based on success rate. The fine-tuned model (gpt40mini_finetuned.json, marked with a green cross) achieves the highest success rate of approximately 85%, far surpassing all other models, including the baseline (gpt40_base.json). The steady decline in success rate for other models, as indicated by the blue dashed line, highlights the fine-tuned model's superior robustness and accuracy.
...and 5 more figures

ChronoLLM: A Framework for Customizing Large Language Model for Digital Twins generalization based on PyChrono

TL;DR

Abstract

ChronoLLM: A Framework for Customizing Large Language Model for Digital Twins generalization based on PyChrono

Authors

TL;DR

Abstract

Table of Contents

Figures (10)