Table of Contents
Fetching ...

VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

Cuong Chi Le, Hoang-Chau Truong-Vinh, Huy Nhat Phan, Dung Duy Le, Tien N. Nguyen, Nghi D. Q. Bui

TL;DR

VisualCoder addresses the gap in code execution reasoning by fusing code with visual Control Flow Graphs (CFGs) through a Reference CoT mechanism that explicitly links each line of code to its CFG node. This grounding mitigates cascading errors seen in naive multimodal CoT prompts and improves dynamic program understanding, error detection, and fault localization across multiple tasks and models. Empirical results show that CFG images outperform text-based CFGs, and the combination of CFGs with Reference CoT yields robust gains, particularly when paired with Multimodal-CoT. The work demonstrates the practical potential of multimodal reasoning for software debugging and analysis, highlighting improved alignment between textual and visual execution cues.

Abstract

Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought (CoT) reasoning with a visual Control Flow Graph (CFG). By aligning code snippets with their corresponding CFGs, VisualCoder provides deeper insights into execution flows. We address challenges in multimodal CoT integration through a reference mechanism, ensuring consistency between code and its execution path, thereby improving performance in program behavior prediction, error detection, and output generation.

VisualCoder: Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning

TL;DR

VisualCoder addresses the gap in code execution reasoning by fusing code with visual Control Flow Graphs (CFGs) through a Reference CoT mechanism that explicitly links each line of code to its CFG node. This grounding mitigates cascading errors seen in naive multimodal CoT prompts and improves dynamic program understanding, error detection, and fault localization across multiple tasks and models. Empirical results show that CFG images outperform text-based CFGs, and the combination of CFGs with Reference CoT yields robust gains, particularly when paired with Multimodal-CoT. The work demonstrates the practical potential of multimodal reasoning for software debugging and analysis, highlighting improved alignment between textual and visual execution cues.

Abstract

Predicting program behavior and reasoning about code execution remain significant challenges in software engineering, particularly for large language models (LLMs) designed for code analysis. While these models excel at understanding static syntax, they often struggle with dynamic reasoning tasks. We introduce VisualCoder, a simple yet effective approach that enhances code reasoning by integrating multimodal Chain-of-Thought (CoT) reasoning with a visual Control Flow Graph (CFG). By aligning code snippets with their corresponding CFGs, VisualCoder provides deeper insights into execution flows. We address challenges in multimodal CoT integration through a reference mechanism, ensuring consistency between code and its execution path, thereby improving performance in program behavior prediction, error detection, and output generation.

Paper Structure

This paper contains 25 sections, 5 equations, 12 figures, 4 tables.

Figures (12)

  • Figure 1: Comparison of Program Execution Reasoning: CFG + CoT w/o Reference vs. CFG + CoT with Reference. With reference, LLM correctly identifies the unreachable node and critical termination point (highlighted in orange).
  • Figure 2: Qualitative comparison of reasoning outputs for buggy code using different prompt settings in Claude Sonet 3.5. Red text indicates where the reasoning fails, green text highlights correctly identified critical points, and blue text in VisualCoder shows the referencing from the plain code to the corresponding nodes in the CFG.
  • Figure 3: Attention Heat Map in CFG Image for each CoT reasoning step.
  • Figure 4: Average Attention Score over Vision Token in CFG Image for each CoT reasoning step.
  • Figure 5: Plain code w/o CoT prompt
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 1: Control Flow Graph - CFG