Table of Contents
Fetching ...

Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation

Deokhyung Kang, Jeonghun Cho, Yejin Jeon, Sunbin Jang, Minsub Lee, Jawoon Cho, Gary Geunbae Lee

TL;DR

This paper tackles the challenge of generating Ladder Diagram (LD) code from natural language in industrial automation, where domain-specific configurations hinder prompt-based approaches. It introduces a two-stage training framework: RAFT-V, which uses retrieval-augmented fine-tuning to leverage recurring LD subroutines, and Direct Preference Optimization (DPO) with graph-edit-based negative sampling to refine outputs. The study demonstrates that training-based fine-tuning outperforms prompting-based methods across multiple text formats and backbones, achieving over 10% absolute gains in program-level exact-match accuracy compared with supervised fine-tuning, and further gains from the second stage. The findings highlight the practical viability of LLM-based LD generation for industrial PLC programming and provide a foundation for extending these methods to other visual programming languages and domains.

Abstract

Visual programming languages (VPLs) allow users to create programs through graphical interfaces, which results in easier accessibility and their widespread usage in various domains. To further enhance this accessibility, recent research has focused on generating VPL code from user instructions using large language models (LLMs). Specifically, by employing prompting-based methods, these studies have shown promising results. Nevertheless, such approaches can be less effective for industrial VPLs such as Ladder Diagram (LD). LD is a pivotal language used in industrial automation processes and involves extensive domain-specific configurations, which are difficult to capture in a single prompt. In this work, we demonstrate that training-based methods outperform prompting-based methods for LD generation accuracy, even with smaller backbone models. Building on these findings, we propose a two-stage training strategy to further enhance VPL generation. First, we employ retrieval-augmented fine-tuning to leverage the repetitive use of subroutines commonly seen in industrial VPLs. Second, we apply direct preference optimization (DPO) to further guide the model toward accurate outputs, using systematically generated preference pairs through graph editing operations. Extensive experiments on real-world LD data demonstrate that our approach improves program-level accuracy by over 10% compared to supervised fine-tuning, which highlights its potential to advance industrial automation.

Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation

TL;DR

This paper tackles the challenge of generating Ladder Diagram (LD) code from natural language in industrial automation, where domain-specific configurations hinder prompt-based approaches. It introduces a two-stage training framework: RAFT-V, which uses retrieval-augmented fine-tuning to leverage recurring LD subroutines, and Direct Preference Optimization (DPO) with graph-edit-based negative sampling to refine outputs. The study demonstrates that training-based fine-tuning outperforms prompting-based methods across multiple text formats and backbones, achieving over 10% absolute gains in program-level exact-match accuracy compared with supervised fine-tuning, and further gains from the second stage. The findings highlight the practical viability of LLM-based LD generation for industrial PLC programming and provide a foundation for extending these methods to other visual programming languages and domains.

Abstract

Visual programming languages (VPLs) allow users to create programs through graphical interfaces, which results in easier accessibility and their widespread usage in various domains. To further enhance this accessibility, recent research has focused on generating VPL code from user instructions using large language models (LLMs). Specifically, by employing prompting-based methods, these studies have shown promising results. Nevertheless, such approaches can be less effective for industrial VPLs such as Ladder Diagram (LD). LD is a pivotal language used in industrial automation processes and involves extensive domain-specific configurations, which are difficult to capture in a single prompt. In this work, we demonstrate that training-based methods outperform prompting-based methods for LD generation accuracy, even with smaller backbone models. Building on these findings, we propose a two-stage training strategy to further enhance VPL generation. First, we employ retrieval-augmented fine-tuning to leverage the repetitive use of subroutines commonly seen in industrial VPLs. Second, we apply direct preference optimization (DPO) to further guide the model toward accurate outputs, using systematically generated preference pairs through graph editing operations. Extensive experiments on real-world LD data demonstrate that our approach improves program-level accuracy by over 10% compared to supervised fine-tuning, which highlights its potential to advance industrial automation.

Paper Structure

This paper contains 44 sections, 11 equations, 11 figures, 10 tables, 1 algorithm.

Figures (11)

  • Figure 1: The topmost subfigure shows a single rung of LD and its corresponding visualized graph. The bottom displays XML tags exportable from a Ladder Diagram IDE, along with JSON and Metaprogram representations that capture structural relationships in the graph.
  • Figure 2: Performance comparison between SFT and RAG, where RAG uses a larger LLM. $N$ represents the number of retrieved examples in RAG, and SFT’s performance is represented by a red dotted line. We use XML as the text format.
  • Figure 3: An overview of the two-stage training method. (1) RAFT-V: An off-the-shelf retriever is utilized for relevant prompt augmentation, and training is conducted with cross-entropy loss. (2) Preference Optimization: Preference learning leverages graph-edited preference pairs, with retrieved prompt-code pairs as additional input.
  • Figure 4: Program EM score across different complexities. We use the metaprogram format with the Llama3.1-8B-Instruct model.
  • Figure 5: Example illustrating the evaluation metrics. Node and edge attributes have been modified from the original data due to security concerns.
  • ...and 6 more figures