Table of Contents
Fetching ...

Training LLMs for Generating IEC 61131-3 Structured Text with Online Feedback

Aaron Haag, Bertram Fuchs, Altay Kacan, Oliver Lohse

TL;DR

Limited public data and the complexity of IEC 61131-3 Structured Text hinder LLM-based code generation. The authors propose an online preference-based fine-tuning loop that combines compiler-based syntactic feedback with a semantic LLM expert, built on supervised fine-tuning with APPS-derived synthetic ST data and iterative Direct-Preference-Optimization. The approach achieves substantial gains in compilation rate ($P_c$), semantic correctness rate ($P_s$), and their joint performance ($P_j$) over baselines, demonstrating data-efficient alignment for domain-specific PLC programming. This framework offers a practical path to industrial PLC copilots, reducing the need for large labeled datasets and enabling scalable ST code generation in automation environments.

Abstract

IEC 61131-3 Structured Text (ST) is a widely used programming language for programmable logic controllers (PLCs) in automation systems. However, generating ST code with LLMs poses unique challenges due to limited data in public training datasets and the complexity of ST language syntax. This paper proposes an approach to fine-tune LLMs for the generation of ST code that leverages a preference-based learning method through an online process involving compiler feedback and evaluation from an LLM-based ST expert. In this framework, the model is iteratively refined and generates new training samples, which are subsequently evaluated by a compiler for syntactical correctness and by a specialized LLM that excels at assessing semantic accuracy, though it is not optimized for code generation itself. This approach results in marked improvements for the trained LLM, leading to higher compilation success rates and better semantic precision. As a result, the framework proves highly suitable for industrial automation applications and outperforms state-of-the-art models.

Training LLMs for Generating IEC 61131-3 Structured Text with Online Feedback

TL;DR

Limited public data and the complexity of IEC 61131-3 Structured Text hinder LLM-based code generation. The authors propose an online preference-based fine-tuning loop that combines compiler-based syntactic feedback with a semantic LLM expert, built on supervised fine-tuning with APPS-derived synthetic ST data and iterative Direct-Preference-Optimization. The approach achieves substantial gains in compilation rate (), semantic correctness rate (), and their joint performance () over baselines, demonstrating data-efficient alignment for domain-specific PLC programming. This framework offers a practical path to industrial PLC copilots, reducing the need for large labeled datasets and enabling scalable ST code generation in automation environments.

Abstract

IEC 61131-3 Structured Text (ST) is a widely used programming language for programmable logic controllers (PLCs) in automation systems. However, generating ST code with LLMs poses unique challenges due to limited data in public training datasets and the complexity of ST language syntax. This paper proposes an approach to fine-tune LLMs for the generation of ST code that leverages a preference-based learning method through an online process involving compiler feedback and evaluation from an LLM-based ST expert. In this framework, the model is iteratively refined and generates new training samples, which are subsequently evaluated by a compiler for syntactical correctness and by a specialized LLM that excels at assessing semantic accuracy, though it is not optimized for code generation itself. This approach results in marked improvements for the trained LLM, leading to higher compilation success rates and better semantic precision. As a result, the framework proves highly suitable for industrial automation applications and outperforms state-of-the-art models.

Paper Structure

This paper contains 16 sections, 2 equations, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the DPO fine-tuning architecture, using a combination of feedback from a compiler and LLM experts.
  • Figure 2: The expert prompt consists contains the specific intent $X_{i,n}$, along with generated code samples from $Y_i$
  • Figure 3: Improvement of different metrics over iterations: A. Compilation rate improvement across training iterations. B. Semantic correctness rate improvement across training iterations. C. Joint improvement of both compilation and semantic correctness rates over training iterations.