Table of Contents
Fetching ...

Enabling BLV Developers with LLM-driven Code Debugging

Clark Saben, Prashant Chandrasekar

TL;DR

The paper addresses the accessibility gap in debugging for BLV developers by introducing BLVRUN, a CLI tool that captures verbose Python tracebacks and summarizes them with a fine-tuned CodeLlama model trained on PyTraceBugs. It achieves fast, local inference through QLoRA-based fine-tuning and Q2K quantization, allowing operation on standard CPUs without IDE plugins. Evaluation against gold-standard summaries shows improved similarity and ROUGE-1 scores, with the tool delivering concise, action-oriented insights that fit naturally into existing text-buffer and printf-debugging workflows. The work promises a practical impact by reducing debugging time and cognitive load for BLV programmers while maintaining their preferred command-line workflow.

Abstract

BLVRUN is a command line shell script designed to offer developers within the BLV community a succinct and insightful overview of traceback errors. Its primary function involves parsing errors and utilizing a refined large language model to generate informative error summaries. In terms of performance, our model rivals that of well-known models like ChatGPT or AI-chatbot plug-ins tailored for specific Integrated Development Environments (IDEs). Importantly, BLV users can seamlessly integrate this tool into their existing development workflows, eliminating the need for any modifications or adaptations to facilitate debugging tasks.

Enabling BLV Developers with LLM-driven Code Debugging

TL;DR

The paper addresses the accessibility gap in debugging for BLV developers by introducing BLVRUN, a CLI tool that captures verbose Python tracebacks and summarizes them with a fine-tuned CodeLlama model trained on PyTraceBugs. It achieves fast, local inference through QLoRA-based fine-tuning and Q2K quantization, allowing operation on standard CPUs without IDE plugins. Evaluation against gold-standard summaries shows improved similarity and ROUGE-1 scores, with the tool delivering concise, action-oriented insights that fit naturally into existing text-buffer and printf-debugging workflows. The work promises a practical impact by reducing debugging time and cognitive load for BLV programmers while maintaining their preferred command-line workflow.

Abstract

BLVRUN is a command line shell script designed to offer developers within the BLV community a succinct and insightful overview of traceback errors. Its primary function involves parsing errors and utilizing a refined large language model to generate informative error summaries. In terms of performance, our model rivals that of well-known models like ChatGPT or AI-chatbot plug-ins tailored for specific Integrated Development Environments (IDEs). Importantly, BLV users can seamlessly integrate this tool into their existing development workflows, eliminating the need for any modifications or adaptations to facilitate debugging tasks.
Paper Structure (11 sections, 5 figures)

This paper contains 11 sections, 5 figures.

Figures (5)

  • Figure 1: Architecture and Development Components of BLVRUN. Starting from the left, a BLV programmer, who using CLI and text buffers executes their Python code. When an error is produced, BLVRUN's script captures the verbose and unstructured text and only presents the user with a concise and accurate description of the error. This is possible because BLVRUN's model is fine-tuned using a dataset we created from PyTraceBugs. Finally, BLVRUN is optimized to run on any machine, thereby not requiring BLV programmers to depends on IDEs and/or switch contexts with ChatGPT-like solutions.
  • Figure 2: Information Flow within BLVRUN. When blvrun sample.py is executed in the shell, the prompt is sent to our model that is hosted on a Ollama server. Our model produces a traceback summary that is sent back to the terminal and saved in a database. BLV programmers can see previously generated summaries using the blvrun prev -n command.
  • Figure 3: Example of the usefulness of BLVRUN. On the left one can see the unstructured, and verbose, output printed to BLV programmers (without the assistance of BLVRUN). On the right, we see the summary produced by BLVRUN. Within the summary, we have highlighted the key takeaway of the error, which BLVRUN presents it in a concise and, therefore, consumable manner.
  • Figure 4: Cosine similarity scores of summaries generated by (1) base model (with no fine-tuning or lowered precision), (2) base model (with lowered precision), (3) BLVRUN's fine-tuned and optimized model compared against "gold standard"
  • Figure 5: ROUGE-1 f-scores of summaries generated by (1) base model (with no fine-tuning or lowered precision), (2) base model (with lowered precision), (3) BLVRUN's fine-tuned and optimized model compared against "gold standard"