Table of Contents
Fetching ...

Assessment of ChatGPT for Engineering Statics Analysis

Benjamin Hope, Jayden Bracey, Sahar Choukir, Derek Warner

TL;DR

This work investigates the reliability of ChatGPT-4o, ChatGPT-o1-preview, and a Custom GPT for engineering statics, spanning basic $F=ma$ calculations to beam and truss analyses and comparing results with first-year students. Through iterative prompt engineering (zero-shot, few-shot, chain-of-thought) and controlled prompts, the Custom GPT achieves $82\%$ on Exam 1 and $86\%$ on Exam 2 Style1, outperforming the student averages in some cases, while image-based prompts remain unreliable. Key findings show that explicit guidance and CoT reasoning improve accuracy, yet models still misclassify member forces (tension vs compression) and struggle with multimodal inputs, especially diagrams. The results underscore the potential of AI as a supplementary teaching and automation tool in engineering mechanics, while also highlighting limitations that motivate future work in improved reasoning, modular instruction design, and enhanced multimodal capabilities. Overall, LLMs can augment education and analysis workflows but are not yet ready to replace traditional methods or human expertise for rigorous statics problems.

Abstract

Large language models (LLMs) such as OpenAI's ChatGPT hold potential for automating engineering analysis, yet their reliability in solving multi-step statics problems remains uncertain. This study evaluates the performance of ChatGPT-4o and ChatGPT-o1-preview on foundational statics tasks, from simple calculations of Newton's second law of motion to beam and truss analyses and compares their results to first-year engineering students on a typical statics exam. To enhance accuracy, we developed a Custom GPT, embedding refined prompts directly into its instructions. This optimized model achieved an 82% score, surpassing the 75% student average, demonstrating the impact of tailored guidance. Despite these improvements, LLMs continued to exhibit errors in nuanced or open-ended problems, such as misidentifying tension and compression in truss members. These findings highlight both the promise and current limitations of AI in structural analysis, emphasizing the need for improved reasoning, multimodal capabilities, and targeted training data for future AI-driven automation in civil and mechanical engineering.

Assessment of ChatGPT for Engineering Statics Analysis

TL;DR

This work investigates the reliability of ChatGPT-4o, ChatGPT-o1-preview, and a Custom GPT for engineering statics, spanning basic calculations to beam and truss analyses and comparing results with first-year students. Through iterative prompt engineering (zero-shot, few-shot, chain-of-thought) and controlled prompts, the Custom GPT achieves on Exam 1 and on Exam 2 Style1, outperforming the student averages in some cases, while image-based prompts remain unreliable. Key findings show that explicit guidance and CoT reasoning improve accuracy, yet models still misclassify member forces (tension vs compression) and struggle with multimodal inputs, especially diagrams. The results underscore the potential of AI as a supplementary teaching and automation tool in engineering mechanics, while also highlighting limitations that motivate future work in improved reasoning, modular instruction design, and enhanced multimodal capabilities. Overall, LLMs can augment education and analysis workflows but are not yet ready to replace traditional methods or human expertise for rigorous statics problems.

Abstract

Large language models (LLMs) such as OpenAI's ChatGPT hold potential for automating engineering analysis, yet their reliability in solving multi-step statics problems remains uncertain. This study evaluates the performance of ChatGPT-4o and ChatGPT-o1-preview on foundational statics tasks, from simple calculations of Newton's second law of motion to beam and truss analyses and compares their results to first-year engineering students on a typical statics exam. To enhance accuracy, we developed a Custom GPT, embedding refined prompts directly into its instructions. This optimized model achieved an 82% score, surpassing the 75% student average, demonstrating the impact of tailored guidance. Despite these improvements, LLMs continued to exhibit errors in nuanced or open-ended problems, such as misidentifying tension and compression in truss members. These findings highlight both the promise and current limitations of AI in structural analysis, emphasizing the need for improved reasoning, multimodal capabilities, and targeted training data for future AI-driven automation in civil and mechanical engineering.

Paper Structure

This paper contains 18 sections, 1 equation, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Accuracy of ChatGPT in calculating $F = m \cdot a$ across 100 iterations. The mass ($m$) and acceleration ($a$) values were randomly generated by ChatGPT for each calculation. The y-axis represents the force ($F$) values calculated by ChatGPT, and the x-axis represents the correct force values calculated using Python. The blue line represents perfect accuracy.
  • Figure 2: Beam problem presented to ChatGPT (hibbeler2018statics).
  • Figure 3: Truss problem presented to ChatGPT Fleischmann2019.
  • Figure 4: Incorrect diagram output produced by ChatGPT-4o from image-based prompt.
  • Figure 5: 2024 First-Year Engineering Statics Exam referred to Exam 2. Problems adapted from beer2019vectorbeer1999mechanics.
  • ...and 2 more figures