ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement

Oishik Chatterjee; Pooja Aggarwal; Suranjana Samanta; Ting Dai; Prateeti Mohapatra; Debanjana Kar; Ruchi Mahindru; Steve Barbieri; Eugen Postea; Brad Blancett; Arthur De Magalhaes

ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement

Oishik Chatterjee, Pooja Aggarwal, Suranjana Samanta, Ting Dai, Prateeti Mohapatra, Debanjana Kar, Ruchi Mahindru, Steve Barbieri, Eugen Postea, Brad Blancett, Arthur De Magalhaes

TL;DR

The paper addresses IT operations automation by presenting ScriptSmith, an execution-free framework that jointly generates, assesses, and refines Bash scripts for incident remediation using large language models. It combines retrieval from a knowledge catalog with a non-execution evaluation pipeline inspired by CodeSift and a refinement loop guided by model feedback and human review. Empirical results on the CodeSift dataset (100 tasks) and InterCode dataset (153 tasks) show a 7–10% overall improvement in script generation, with notable gains from peer-review prompting and refinement (up to 17%). Deployment within IBM Instana’s Intelligent Remediation pipeline and a six-month user study with domain experts validate practical utility, while the authors discuss scalability, evaluation limitations, and future work to extend to other scripting languages such as PowerShell and to automate test-case generation.

Abstract

In the rapidly evolving landscape of site reliability engineering (SRE), the demand for efficient and effective solutions to manage and resolve issues in site and cloud applications is paramount. This paper presents an innovative approach to action automation using large language models (LLMs) for script generation, assessment, and refinement. By leveraging the capabilities of LLMs, we aim to significantly reduce the human effort involved in writing and debugging scripts, thereby enhancing the productivity of SRE teams. Our experiments focus on Bash scripts, a commonly used tool in SRE, and involve the CodeSift dataset of 100 tasks and the InterCode dataset of 153 tasks. The results show that LLMs can automatically assess and refine scripts efficiently, reducing the need for script validation in an execution environment. Results demonstrate that the framework shows an overall improvement of 7-10% in script generation.

ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement

TL;DR

Abstract

ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement

Authors

TL;DR

Abstract

Table of Contents

Figures (2)