CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

Cuong Chi Le; Hoang Nhat Phan; Huy Nhat Phan; Tien N. Nguyen; Nghi D. Q. Bui

CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

Cuong Chi Le, Hoang Nhat Phan, Huy Nhat Phan, Tien N. Nguyen, Nghi D. Q. Bui

TL;DR

CodeFlow tackles the problem of predicting program behavior without execution by learning both static control-flow dependencies and dynamic dependencies along execution paths on a control-flow graph (CFG). It builds a CFG from code, converts loops to a uniform two-node structure, learns per-node embeddings with a GRU-based encoder, and applies a specialized dynamic-dependency message passing to capture execution-specific relations. Coverage prediction proceeds via a linear head on the node states, while runtime error detection/localization leverages path continuity and the furthest node reached, enabling precise error localization. Empirically, CodeFlow outperforms baselines on exact and branch-coverage metrics, achieves high runtime-error detection accuracy, and demonstrates strong localization and fuzz-testing usefulness, including robust performance on incomplete code. This CFG-centric approach offers an efficient, scalable alternative to large LLMs for dynamic program behavior prediction with practical implications for static analysis, fuzzing, and safety checks in code snippets from online sources.

Abstract

Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control flow graphs (CFGs), CodeFlow effectively represents all possible execution paths and the statistic relations between different statements, providing a more comprehensive understanding of program behaviors. CodeFlow constructs CFGs to represent possible execution paths and learns vector representations (embeddings) for CFG nodes, capturing static control-flow dependencies. Additionally, it learns dynamic dependencies by leveraging execution traces, which reflect the impacts among statements during execution. This combination enables CodeFlow to accurately predict code coverage and identify runtime errors. Our empirical evaluation demonstrates that CodeFlow significantly improves code coverage prediction accuracy and effectively localizes runtime errors, outperforming state-of-the-art models.

CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

TL;DR

Abstract

Paper Structure (41 sections, 9 equations, 7 figures, 6 tables)

This paper contains 41 sections, 9 equations, 7 figures, 6 tables.

Introduction
Motivation
Example and Observations
Observation 1. Conditional Statements
Observation 2. Complex Loop Branching
Observation 3: Information Loss in Repeated Loops
Observation 4: Runtime Error Detection
Key Ideas
Key Idea 1. [Learning Code Execution on Control Flow Graph]
Key Idea 2. [Dynamic Dependencies Learning via Execution Paths on CFG]
Key Idea 3. [Detecting Runtime Error via CFG]
Approach Overview
Control Flow Graph Building
Source Code Representation Learning
Dynamic Dependencies Learning
...and 26 more sections

Figures (7)

Figure 1: Code Coverage Prediction Comparison
Figure 2: Control Flow Graph for code in Fig. \ref{['fig:code_snippet']}
Figure 3: CodeFlow: Predictive Code Coverage and Runtime Error Detection with Dynamic Dependencies Learning on CFG
Figure 4: Source Code Representation Learning
Figure 5: Dynamic Dependencies Learning
...and 2 more figures

Theorems & Definitions (1)

Definition 1: Control Flow Graph - CFG

CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

TL;DR

Abstract

CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)

Theorems & Definitions (1)