Neural Variable Name Repair: Learning to Rename Identifiers for Readability
Muhammad Yousuf, Akshat Bagade, Chhittebbayi Penugonda, Maanas Baraya
TL;DR
The paper tackles variable name repair by predicting descriptive replacements for masked local identifiers in real C++ functions using only function-local context. It combines task-specific LoRA fine-tuning on Llama 3.1-8B with a warmup/dropout schedule and a dual-encoder reranker to improve candidate selection, trained on a large dataset mined from The Stack via Tree-sitter. Empirical results show substantial gains over prompting baselines, achieving 43.1% exact match and 82.03 partial-credit with the generator, and further improvements to 46.0% exact match and 84.5 partial-credit after reranking. The work demonstrates that lightweight, targeted adapters plus a reranker can form a practical identifier-repair tool that enhances code readability and downstream reasoning for developers and AI systems.
Abstract
Developers routinely work with source files whose variable names are generic or misleading, and with teams moving quickly, many functions are left undocumented. This slows comprehension, increases the risk of subtle bugs, and makes it harder for both humans and large language models (LLMs) to reason about code. We study variable name repair: given a real C++ function where all occurrences of one local or parameter name have been replaced by a placeholder (e.g. ID 1), the goal is to generate a natural, descriptive replacement name. We automatically construct this task from the C++ portion of BigCode's The Stack by parsing functions with Tree-sitter, masking a single identifier, and treating the original name as supervision. On top of Llama 3.1-8B, we build a pipeline with (i) warmup and dropout schedules for more stable fine-tuning, (ii) LoRA adapters for efficient specialization on identifier repair, and (iii) a dual-encoder reranker over top-k generator candidates. We evaluate using exact match, Top-5 Hit, and an embedding-based partial similarity score (0-100) that gives credit for near synonyms and format variants (e.g., jsonValue vs. json). On a held-out set of 200 C++ functions, a zero-shot Llama 3.1 baseline reaches 6.1 percent exact match. Our best LoRA-tuned model (with warmup and dropout) achieves 43.1 percent exact match, 50.2 percent Top-5 Hit, and an 82.03 partial-match score. A dual encoder reranker further improves selection quality without modifying the underlying generator, suggesting that task-specific fine-tuning plus reranking is a promising approach for practical identifier repair tools.
