Table of Contents
Fetching ...

Towards an Understanding of Context Utilization in Code Intelligence

Yanlin Wang, Kefeng Duan, Dewu Zheng, Ensheng Shi, Fengji Zhang, Yanli Wang, Jiachi Chen, Xilin Liu, Yuchi Ma, Hongyu Zhang, Qianxiang Wang, Zibin Zheng

TL;DR

This paper presents the first systematic review of context utilization in code intelligence, analyzing 146 studies from 2007 to 2024 to characterize how context is defined, categorized, and applied across seven CI tasks. It introduces a taxonomy separating direct and indirect context, surveys preprocessing and modeling approaches (rule-based, feature-based, DL-based, and LLM-based), and evaluates current evaluation practices, datasets, and open-source resource sharing. The findings reveal a predominance of direct context and rising adoption of LLMs, while highlighting gaps in context-specific benchmarks and reproducibility, and propose a roadmap for integrating multiple contexts, developing robust context-utilization mechanisms, and improving evaluation frameworks. Together, these insights guide the development of more robust, generalizable, and context-aware CI systems capable of leveraging diverse signals from code repositories and their surrounding artifacts.

Abstract

Code intelligence is an emerging domain in software engineering, aiming to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs (i.e., source code) can substantially enhance model performance. Such contextual signals may be obtained directly or indirectly from sources such as API documentation or intermediate representations like abstract syntax trees can significantly improve the effectiveness of code intelligence. Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence. To address this gap, we conduct an extensive literature review of 146 relevant studies published between September 2007 and August 2024. Our investigation yields four main contributions. (1) A quantitative analysis of the research landscape, including publication trends, venues, and the explored domains; (2) A novel taxonomy of context types used in code intelligence; (3) A task-oriented analysis investigating context integration strategies across diverse code intelligence tasks; (4) A critical evaluation of evaluation methodologies for context-aware methods. Based on these findings, we identify fundamental challenges in context utilization in current code intelligence systems and propose a research roadmap that outlines key opportunities for future research.

Towards an Understanding of Context Utilization in Code Intelligence

TL;DR

This paper presents the first systematic review of context utilization in code intelligence, analyzing 146 studies from 2007 to 2024 to characterize how context is defined, categorized, and applied across seven CI tasks. It introduces a taxonomy separating direct and indirect context, surveys preprocessing and modeling approaches (rule-based, feature-based, DL-based, and LLM-based), and evaluates current evaluation practices, datasets, and open-source resource sharing. The findings reveal a predominance of direct context and rising adoption of LLMs, while highlighting gaps in context-specific benchmarks and reproducibility, and propose a roadmap for integrating multiple contexts, developing robust context-utilization mechanisms, and improving evaluation frameworks. Together, these insights guide the development of more robust, generalizable, and context-aware CI systems capable of leveraging diverse signals from code repositories and their surrounding artifacts.

Abstract

Code intelligence is an emerging domain in software engineering, aiming to improve the effectiveness and efficiency of various code-related tasks. Recent research suggests that incorporating contextual information beyond the basic original task inputs (i.e., source code) can substantially enhance model performance. Such contextual signals may be obtained directly or indirectly from sources such as API documentation or intermediate representations like abstract syntax trees can significantly improve the effectiveness of code intelligence. Despite growing academic interest, there is a lack of systematic analysis of context in code intelligence. To address this gap, we conduct an extensive literature review of 146 relevant studies published between September 2007 and August 2024. Our investigation yields four main contributions. (1) A quantitative analysis of the research landscape, including publication trends, venues, and the explored domains; (2) A novel taxonomy of context types used in code intelligence; (3) A task-oriented analysis investigating context integration strategies across diverse code intelligence tasks; (4) A critical evaluation of evaluation methodologies for context-aware methods. Based on these findings, we identify fundamental challenges in context utilization in current code intelligence systems and propose a research roadmap that outlines key opportunities for future research.

Paper Structure

This paper contains 34 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The Pipeline in Collecting Relevant Papers
  • Figure 2: Number of Papers Published Per Year
  • Figure 3: (Left) Number of Publications Per Type. (Right) Number of Publications Per Venue.
  • Figure 4: (Left) Number of Papers Per Type of Main Contribution. (Right) Number of Papers Per Tasks.
  • Figure 5: Types of Context Used in the Seven Studied Code Intelligence Tasks
  • ...and 2 more figures