Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

Chen Shen; Wei Cheng; Jingyue Yang; Huan Zhang; Yuhan Wu; Wei Hu

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

Chen Shen, Wei Cheng, Jingyue Yang, Huan Zhang, Yuhan Wu, Wei Hu

TL;DR

The paper tackles the challenge of coding in unfamiliar programming languages by proposing Inference-time Language Acquisition (ILA) and a general ILA-agent framework. It combines exploration primitives that query official documentation with verification primitives that test and validate code in an execution environment, modeled as a POMDP with resources $R=\{D,E\}$ and state $s_t=(Q,(a_0,o_0),...,(a_{t-1},o_{t-1}))$. A language-specific extension, exemplified by TypeLookup, augments the primitives for statically-typed languages. The authors construct Cangjie-bench to evaluate ILA in a low-resource setting and demonstrate that ILA-agent substantially outperforms finetuning and RAG baselines across code generation, translation, and program repair, while revealing emergent behavioral patterns and areas for improvement. Overall, the work establishes a pragmatic pathway for deploying LLMs in emerging language ecosystems where large-scale corpora are scarce, with potential impact on automated software development and developer productivity.

Abstract

The proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfamiliar programming languages. Departing from data-intensive finetuning, we investigate the paradigm of Inference-time Language Acquisition (ILA), where an LLM masters an unfamiliar language through dynamic interaction with limited external resources. In this paper, we propose ILA-agent, a general ILA framework that equips LLMs with a set of behavioral primitives. By modeling essential human-like behaviors as a suite of tools, ILA-agent enables LLMs to incrementally explore, apply, and verify language knowledge through structured interactions with the official documentation and execution environment. To provide a rigorous evaluation in a low-resource setting, we construct Cangjie-bench, a multi-task benchmark based on the novel statically-typed language Cangjie. We instantiate ILA-agent for Cangjie and evaluate its performance across code generation, translation, and program repair tasks. Results using diverse LLMs demonstrate that ILA-agent significantly outperforms retrieval-augmented baselines. Further analysis of agent trajectories characterizes the emergent behavior patterns while highlighting persisting performance gaps.

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

TL;DR

and state

. A language-specific extension, exemplified by TypeLookup, augments the primitives for statically-typed languages. The authors construct Cangjie-bench to evaluate ILA in a low-resource setting and demonstrate that ILA-agent substantially outperforms finetuning and RAG baselines across code generation, translation, and program repair, while revealing emergent behavioral patterns and areas for improvement. Overall, the work establishes a pragmatic pathway for deploying LLMs in emerging language ecosystems where large-scale corpora are scarce, with potential impact on automated software development and developer productivity.

Abstract

Paper Structure (37 sections, 9 figures, 4 tables)

This paper contains 37 sections, 9 figures, 4 tables.

Introduction
Problem Formulation
Paradigms and Their Challenges
Inference-time Language Acquisition
Methodology
Exploration Primitives
Verification Primitives
Language-specific Extensions
Benchmark Construction
Code Generation
Code Translation
Program Repair
Experiments and Results
Baselines
Implementation Details
...and 22 more sections

Figures (9)

Figure 1: Overview of ILA-agent and Cangjie-bench.
Figure 2: Usage of behavioral primitives across different stages, using Qwen3-Max as the foundation model.
Figure 3: Action transition probabilities across different LLMs, micro-averaged over three coding tasks.
Figure 4: Usage of behavioral primitives across different stages, DeepSeek-V3.2 as the foundation model.
Figure 5: Usage of behavioral primitives across different stages, Claude-Sonnet-4.5 as the foundation model.
...and 4 more figures

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

TL;DR

Abstract

Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks

Authors

TL;DR

Abstract

Table of Contents

Figures (9)