COBOLAssist: Analyzing and Fixing Compilation Errors for LLM-Powered COBOL Code Generation

Anh T. V. Dau; Shin Hwei Tan; Jinqiu Yang; Nghi D. Q. Bui; Anh Tuan Nguyen

COBOLAssist: Analyzing and Fixing Compilation Errors for LLM-Powered COBOL Code Generation

Anh T. V. Dau, Shin Hwei Tan, Jinqiu Yang, Nghi D. Q. Bui, Anh Tuan Nguyen

Abstract

Legacy programming languages such as COBOL (Common Business-Oriented Language) remain critical in business computing. However, maintaining legacy COBOL systems is increasingly challenging due to a declining pool of skilled developers and the persistence of COBOL errors that require deep domain expertise to resolve. This paper investigates the challenges of COBOL compilation errors and introduces a framework leveraging large language models (LLMs) to address these issues. We first categorize the common compilation errors in LLM-generated COBOL code into three groups: incomplete code errors, syntax errors, and type-related errors. We further propose COBOLAssist, a technique to enhance code correctness through iterative repairs guided by compilation feedback. Our evaluation using five LLMs including GPT variants and mAInframer, shows a high prevalence of incorrect program structures and function usage in COBOL programs and demonstrates the effectiveness of COBOLAssist, with the compilation success rates increasing from 29.5\% to 64.38\% for GPT-4o-mini and from 41.8\% to 95.89\% for GPT-4o. It also improves pass@1 significantly, for example from 9.1 to 22.6 for GPT-4. Notably, while mAInframer-34B achieves the highest compilation success rate, its functional correctness remains limited. This research not only highlights the limitations in current LLMs for COBOL but also demonstrates a practical path forward for automated debugging in legacy systems.

COBOLAssist: Analyzing and Fixing Compilation Errors for LLM-Powered COBOL Code Generation

Abstract

COBOLAssist: Analyzing and Fixing Compilation Errors for LLM-Powered COBOL Code Generation

Abstract

Paper Structure

Table of Contents

Figures (7)