Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks
Chen Shen, Wei Cheng, Jingyue Yang, Huan Zhang, Yuhan Wu, Wei Hu
TL;DR
The paper tackles the challenge of coding in unfamiliar programming languages by proposing Inference-time Language Acquisition (ILA) and a general ILA-agent framework. It combines exploration primitives that query official documentation with verification primitives that test and validate code in an execution environment, modeled as a POMDP with resources $R=\{D,E\}$ and state $s_t=(Q,(a_0,o_0),...,(a_{t-1},o_{t-1}))$. A language-specific extension, exemplified by TypeLookup, augments the primitives for statically-typed languages. The authors construct Cangjie-bench to evaluate ILA in a low-resource setting and demonstrate that ILA-agent substantially outperforms finetuning and RAG baselines across code generation, translation, and program repair, while revealing emergent behavioral patterns and areas for improvement. Overall, the work establishes a pragmatic pathway for deploying LLMs in emerging language ecosystems where large-scale corpora are scarce, with potential impact on automated software development and developer productivity.
Abstract
The proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfamiliar programming languages. Departing from data-intensive finetuning, we investigate the paradigm of Inference-time Language Acquisition (ILA), where an LLM masters an unfamiliar language through dynamic interaction with limited external resources. In this paper, we propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives. By modeling essential human-like behaviors as a suite of tools, ILA-agent enables LLMs to incrementally explore, apply, and verify language knowledge through structured interactions with the official documentation and execution environment. To provide a rigorous evaluation in a low-resource setting, we construct Cangjie-bench, a multi-task benchmark based on the novel statically-typed language Cangjie. We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks. Results using diverse LLMs demonstrate that ILA-agent significantly outperforms retrieval-augmented baselines. Further analysis of agent trajectories characterizes the emergent behavior patterns while highlighting persisting performance gaps.
