Table of Contents
Fetching ...

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

Chen Shen, Wei Cheng, Jingyue Yang, Huan Zhang, Yuhan Wu, Wei Hu

TL;DR

The paper tackles the challenge of coding in unfamiliar programming languages by proposing Inference-time Language Acquisition (ILA) and a general ILA-agent framework. It combines exploration primitives that query official documentation with verification primitives that test and validate code in an execution environment, modeled as a POMDP with resources $R=\{D,E\}$ and state $s_t=(Q,(a_0,o_0),...,(a_{t-1},o_{t-1}))$. A language-specific extension, exemplified by TypeLookup, augments the primitives for statically-typed languages. The authors construct Cangjie-bench to evaluate ILA in a low-resource setting and demonstrate that ILA-agent substantially outperforms finetuning and RAG baselines across code generation, translation, and program repair, while revealing emergent behavioral patterns and areas for improvement. Overall, the work establishes a pragmatic pathway for deploying LLMs in emerging language ecosystems where large-scale corpora are scarce, with potential impact on automated software development and developer productivity.

Abstract

The proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfamiliar programming languages. Departing from data-intensive finetuning, we investigate the paradigm of Inference-time Language Acquisition (ILA), where an LLM masters an unfamiliar language through dynamic interaction with limited external resources. In this paper, we propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives. By modeling essential human-like behaviors as a suite of tools, ILA-agent enables LLMs to incrementally explore, apply, and verify language knowledge through structured interactions with the official documentation and execution environment. To provide a rigorous evaluation in a low-resource setting, we construct Cangjie-bench, a multi-task benchmark based on the novel statically-typed language Cangjie. We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks. Results using diverse LLMs demonstrate that ILA-agent significantly outperforms retrieval-augmented baselines. Further analysis of agent trajectories characterizes the emergent behavior patterns while highlighting persisting performance gaps.

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

TL;DR

The paper tackles the challenge of coding in unfamiliar programming languages by proposing Inference-time Language Acquisition (ILA) and a general ILA-agent framework. It combines exploration primitives that query official documentation with verification primitives that test and validate code in an execution environment, modeled as a POMDP with resources and state . A language-specific extension, exemplified by TypeLookup, augments the primitives for statically-typed languages. The authors construct Cangjie-bench to evaluate ILA in a low-resource setting and demonstrate that ILA-agent substantially outperforms finetuning and RAG baselines across code generation, translation, and program repair, while revealing emergent behavioral patterns and areas for improvement. Overall, the work establishes a pragmatic pathway for deploying LLMs in emerging language ecosystems where large-scale corpora are scarce, with potential impact on automated software development and developer productivity.

Abstract

The proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfamiliar programming languages. Departing from data-intensive finetuning, we investigate the paradigm of Inference-time Language Acquisition (ILA), where an LLM masters an unfamiliar language through dynamic interaction with limited external resources. In this paper, we propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives. By modeling essential human-like behaviors as a suite of tools, ILA-agent enables LLMs to incrementally explore, apply, and verify language knowledge through structured interactions with the official documentation and execution environment. To provide a rigorous evaluation in a low-resource setting, we construct Cangjie-bench, a multi-task benchmark based on the novel statically-typed language Cangjie. We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks. Results using diverse LLMs demonstrate that ILA-agent significantly outperforms retrieval-augmented baselines. Further analysis of agent trajectories characterizes the emergent behavior patterns while highlighting persisting performance gaps.
Paper Structure (37 sections, 9 figures, 4 tables)

This paper contains 37 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of ILA-agent and Cangjie-bench.
  • Figure 2: Usage of behavioral primitives across different stages, using Qwen3-Max as the foundation model.
  • Figure 3: Action transition probabilities across different LLMs, micro-averaged over three coding tasks.
  • Figure 4: Usage of behavioral primitives across different stages, DeepSeek-V3.2 as the foundation model.
  • Figure 5: Usage of behavioral primitives across different stages, Claude-Sonnet-4.5 as the foundation model.
  • ...and 4 more figures