Detecting and Fixing API Misuses of Data Science Libraries Using Large Language Models
Akalanka Galappaththi, Francisco Ribeiro, Sarah Nadi
TL;DR
This work tackles API misuses in data-science libraries by introducing DSChecker, an LLM-based approach that leverages both API directives and dynamic data information to detect and fix misuses. It demonstrates that providing structured, directive-aware prompts and data context significantly boosts performance across multiple LLMs, with the best zero-shot configuration achieving strong detection and repair outcomes. An agentic variant, DSChecker_agent, investigates real-world applicability by enabling on-demand information retrieval, showing feasibility though with some performance trade-offs. The study extends to other data-centric libraries and compares with existing LLM-based misuse detectors, highlighting DSChecker's superior detection/fix rates in many settings and outlining practical challenges and future directions for LLM-driven tooling in software libraries.
Abstract
Data science libraries, such as scikit-learn and pandas, specialize in processing and manipulating data. The data-centric nature of these libraries makes the detection of API misuse in them more challenging. This paper introduces DSCHECKER, an LLM-based approach designed for detecting and fixing API misuses of data science libraries. We identify two key pieces of information, API directives and data information, that may be beneficial for API misuse detection and fixing. Using three LLMs and misuses from five data science libraries, we experiment with various prompts. We find that incorporating API directives and data-specific details enhances Dschecker's ability to detect and fix API misuses, with the best-performing model achieving a detection F1-score of 61.18 percent and fixing 51.28 percent of the misuses. Building on these results, we implement Dschecker agent which includes an adaptive function calling mechanism to access information on demand, simulating a real-world setting where information about the misuse is unknown in advance. We find that Dschecker agent achieves 48.65 percent detection F1-score and fixes 39.47 percent of the misuses, demonstrating the promise of LLM-based API misuse detection and fixing in real-world scenarios.
