Table of Contents
Fetching ...

CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

Cuong Chi Le, Hoang Nhat Phan, Huy Nhat Phan, Tien N. Nguyen, Nghi D. Q. Bui

TL;DR

CodeFlow tackles the problem of predicting program behavior without execution by learning both static control-flow dependencies and dynamic dependencies along execution paths on a control-flow graph (CFG). It builds a CFG from code, converts loops to a uniform two-node structure, learns per-node embeddings with a GRU-based encoder, and applies a specialized dynamic-dependency message passing to capture execution-specific relations. Coverage prediction proceeds via a linear head on the node states, while runtime error detection/localization leverages path continuity and the furthest node reached, enabling precise error localization. Empirically, CodeFlow outperforms baselines on exact and branch-coverage metrics, achieves high runtime-error detection accuracy, and demonstrates strong localization and fuzz-testing usefulness, including robust performance on incomplete code. This CFG-centric approach offers an efficient, scalable alternative to large LLMs for dynamic program behavior prediction with practical implications for static analysis, fuzzing, and safety checks in code snippets from online sources.

Abstract

Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control flow graphs (CFGs), CodeFlow effectively represents all possible execution paths and the statistic relations between different statements, providing a more comprehensive understanding of program behaviors. CodeFlow constructs CFGs to represent possible execution paths and learns vector representations (embeddings) for CFG nodes, capturing static control-flow dependencies. Additionally, it learns dynamic dependencies by leveraging execution traces, which reflect the impacts among statements during execution. This combination enables CodeFlow to accurately predict code coverage and identify runtime errors. Our empirical evaluation demonstrates that CodeFlow significantly improves code coverage prediction accuracy and effectively localizes runtime errors, outperforming state-of-the-art models.

CodeFlow: Program Behavior Prediction with Dynamic Dependencies Learning

TL;DR

CodeFlow tackles the problem of predicting program behavior without execution by learning both static control-flow dependencies and dynamic dependencies along execution paths on a control-flow graph (CFG). It builds a CFG from code, converts loops to a uniform two-node structure, learns per-node embeddings with a GRU-based encoder, and applies a specialized dynamic-dependency message passing to capture execution-specific relations. Coverage prediction proceeds via a linear head on the node states, while runtime error detection/localization leverages path continuity and the furthest node reached, enabling precise error localization. Empirically, CodeFlow outperforms baselines on exact and branch-coverage metrics, achieves high runtime-error detection accuracy, and demonstrates strong localization and fuzz-testing usefulness, including robust performance on incomplete code. This CFG-centric approach offers an efficient, scalable alternative to large LLMs for dynamic program behavior prediction with practical implications for static analysis, fuzzing, and safety checks in code snippets from online sources.

Abstract

Predicting program behavior without execution is a critical task in software engineering. Existing models often fall short in capturing the dynamic dependencies among program elements. To address this, we present CodeFlow, a novel machine learning-based approach that predicts code coverage and detects runtime errors by learning both static and dynamic dependencies within the code. By using control flow graphs (CFGs), CodeFlow effectively represents all possible execution paths and the statistic relations between different statements, providing a more comprehensive understanding of program behaviors. CodeFlow constructs CFGs to represent possible execution paths and learns vector representations (embeddings) for CFG nodes, capturing static control-flow dependencies. Additionally, it learns dynamic dependencies by leveraging execution traces, which reflect the impacts among statements during execution. This combination enables CodeFlow to accurately predict code coverage and identify runtime errors. Our empirical evaluation demonstrates that CodeFlow significantly improves code coverage prediction accuracy and effectively localizes runtime errors, outperforming state-of-the-art models.
Paper Structure (41 sections, 9 equations, 7 figures, 6 tables)

This paper contains 41 sections, 9 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: Code Coverage Prediction Comparison
  • Figure 2: Control Flow Graph for code in Fig. \ref{['fig:code_snippet']}
  • Figure 3: CodeFlow: Predictive Code Coverage and Runtime Error Detection with Dynamic Dependencies Learning on CFG
  • Figure 4: Source Code Representation Learning
  • Figure 5: Dynamic Dependencies Learning
  • ...and 2 more figures

Theorems & Definitions (1)

  • Definition 1: Control Flow Graph - CFG