Table of Contents
Fetching ...

If LLMs Would Just Look: Simple Line-by-line Checking Improves Vulnerability Localization

Yue Li, Xiao Li, Hao Wu, Yue Zhang, Xiuzhen Cheng, Yating Liu, Fengyuan Xu, Sheng Zhong

TL;DR

The paper tackles the challenge of locating vulnerabilities within large codebases where traditional methods are slow or language-specific. It proposes LOVA, a three-stage framework that leverages self-attention in decoder-only LLMs to identify vulnerable lines by analyzing differences in attention when lines are highlighted. Through extensive evaluation on multi-language benchmarks and smart contracts, LOVA delivers substantial gains in F1-score and Top-N accuracy, demonstrating robustness across LLM architectures and languages. The approach offers scalable, cross-language vulnerability localization without costly fine-tuning, potentially accelerating secure software remediation. The work also introduces a novel attention-based representation (VulnAttnMat) and a language-aware classifier for probabilistic line-level localization.

Abstract

The rapid expansion of software systems and the growing number of reported vulnerabilities have emphasized the importance of accurately identifying vulnerable code segments. Traditional methods for vulnerability localization, such as manual code audits or rule-based tools, are often time-consuming and limited in scope, typically focusing on specific programming languages or types of vulnerabilities. In recent years, the introduction of large language models (LLMs) such as GPT and LLaMA has opened new possibilities for automating vulnerability detection. However, while LLMs show promise in this area, they face challenges, particularly in maintaining accuracy over longer code contexts. This paper introduces LOVA, a novel framework leveraging the self-attention mechanisms inherent in LLMs to enhance vulnerability localization. Our key insight is that self-attention mechanisms assign varying importance to different parts of the input, making it possible to track how much attention the model focuses on specific lines of code. In the context of vulnerability localization, the hypothesis is that vulnerable lines of code will naturally attract higher attention weights because they have a greater influence on the model's output. By systematically tracking changes in attention weights and focusing on specific lines of code, LOVA improves the precision of identifying vulnerable lines across various programming languages. Through rigorous experimentation and evaluation, we demonstrate that LOVA significantly outperforms existing LLM-based approaches, achieving up to a 5.3x improvement in F1-scores. LOVA also demonstrated strong scalability, with up to a 14.6x improvement in smart contract vulnerability localization across languages like C, Python, Java, and Solidity. Its robustness was proven through consistent performance across different LLM architectures.

If LLMs Would Just Look: Simple Line-by-line Checking Improves Vulnerability Localization

TL;DR

The paper tackles the challenge of locating vulnerabilities within large codebases where traditional methods are slow or language-specific. It proposes LOVA, a three-stage framework that leverages self-attention in decoder-only LLMs to identify vulnerable lines by analyzing differences in attention when lines are highlighted. Through extensive evaluation on multi-language benchmarks and smart contracts, LOVA delivers substantial gains in F1-score and Top-N accuracy, demonstrating robustness across LLM architectures and languages. The approach offers scalable, cross-language vulnerability localization without costly fine-tuning, potentially accelerating secure software remediation. The work also introduces a novel attention-based representation (VulnAttnMat) and a language-aware classifier for probabilistic line-level localization.

Abstract

The rapid expansion of software systems and the growing number of reported vulnerabilities have emphasized the importance of accurately identifying vulnerable code segments. Traditional methods for vulnerability localization, such as manual code audits or rule-based tools, are often time-consuming and limited in scope, typically focusing on specific programming languages or types of vulnerabilities. In recent years, the introduction of large language models (LLMs) such as GPT and LLaMA has opened new possibilities for automating vulnerability detection. However, while LLMs show promise in this area, they face challenges, particularly in maintaining accuracy over longer code contexts. This paper introduces LOVA, a novel framework leveraging the self-attention mechanisms inherent in LLMs to enhance vulnerability localization. Our key insight is that self-attention mechanisms assign varying importance to different parts of the input, making it possible to track how much attention the model focuses on specific lines of code. In the context of vulnerability localization, the hypothesis is that vulnerable lines of code will naturally attract higher attention weights because they have a greater influence on the model's output. By systematically tracking changes in attention weights and focusing on specific lines of code, LOVA improves the precision of identifying vulnerable lines across various programming languages. Through rigorous experimentation and evaluation, we demonstrate that LOVA significantly outperforms existing LLM-based approaches, achieving up to a 5.3x improvement in F1-scores. LOVA also demonstrated strong scalability, with up to a 14.6x improvement in smart contract vulnerability localization across languages like C, Python, Java, and Solidity. Its robustness was proven through consistent performance across different LLM architectures.

Paper Structure

This paper contains 18 sections, 3 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Vulnerable code example
  • Figure 2: The relationship between Lines of Code (LoC) and Average Top-1 Accuracy, generated by Llama-3.1-8B-Instruct, on all datasets in §\ref{['subsec:experimentsetup']}
  • Figure 3: Attention maps for four programs, where (a) represents a vulnerability-free program, (c) shows the program after injecting a vulnerability at line 70 of (a). (b) and (d) represent the programs after highlighting line 70 for (a) and (c), respectively. In each figure, each row corresponds to a line number in the code, while the columns indicate the attention from different decoder layers. Darker colors indicate stronger attention, and the black boxes on the line numbers mark the vulnerable lines identified by the LLM.
  • Figure 4: Overview of LOVA. For vulnerable code, the process begins by highlighting each line individually to generate a base prompt and a set of highlighted prompts. These prompts are then processed through the LLM's prefill phase to obtain their respective attention maps. For each highlighted attention map, the difference with the base attention map is computed, and the result is flattened to form an attention vector corresponding to each highlighted line. These attention vectors are then classified by a Bi-LSTM classifier to determine whether each line contains a vulnerability, ultimately yielding a suspicious score for each line.
  • Figure 5: 1 out of 10 highlighted prompt examples for one code (highlighting line 8)
  • ...and 2 more figures