Table of Contents
Fetching ...

Enhanced LLM-Based Framework for Predicting Null Pointer Dereference in Source Code

Md. Fahim Sultan, Tasmin Karim, Md. Shazzad Hossain Shaon, Mohammad Wardat, Mst Shapna Akter

TL;DR

The paper tackles the detection of CWE-476 NULL pointer dereferences in source code by proposing DeLLNeuN, a fine-tuned LLM-based framework that leverages multi-layer CodeBERT representations to improve vulnerability prediction. By combining CodeBERT-derived features with a dropout, dense, and sigmoid classifier, DeLLNeuN outperforms several baselines (CodeBERT, GraphCodeBERT, RoBERTa, LSTM, GPT-2) on the Draper VDISC dataset, achieving around 0.87 accuracy and 0.88 precision. The study demonstrates the value of aggregating layer-wise CodeBERT information to enhance vulnerability detection in code, suggesting practical potential as an early vulnerability checker in software development. The work highlights both the benefits and limitations of current LLM-based code analysis approaches and emphasizes the need for scalable, resource-aware strategies to handle real-world, large-scale codebases for proactive cybersecurity.

Abstract

Software security is crucial in any field where breaches can exploit sensitive data, and lead to financial losses. As a result, vulnerability detection becomes an essential part of the software development process. One of the key steps in maintaining software integrity is identifying vulnerabilities in the source code before deployment. A security breach like CWE-476, which stands for NULL pointer dereferences (NPD), is crucial because it can cause software crashes, unpredictable behavior, and security vulnerabilities. In this scientific era, there are several vulnerability checkers, where, previous tools often fall short in analyzing specific feature connections of the source code, which weakens the tools in real-world scenarios. In this study, we propose another novel approach using a fine-tuned Large Language Model (LLM) termed "DeLLNeuN". This model leverages the advantage of various layers to reduce both overfitting and non-linearity, enhancing its performance and reliability. Additionally, this method provides dropout and dimensionality reduction to help streamline the model, making it faster and more efficient. Our model showed 87% accuracy with 88% precision using the Draper VDISC dataset. As software becomes more complex and cyber threats continuously evolve, the need for proactive security measures will keep growing. In this particular case, the proposed model looks promising to use as an early vulnerability checker in software development.

Enhanced LLM-Based Framework for Predicting Null Pointer Dereference in Source Code

TL;DR

The paper tackles the detection of CWE-476 NULL pointer dereferences in source code by proposing DeLLNeuN, a fine-tuned LLM-based framework that leverages multi-layer CodeBERT representations to improve vulnerability prediction. By combining CodeBERT-derived features with a dropout, dense, and sigmoid classifier, DeLLNeuN outperforms several baselines (CodeBERT, GraphCodeBERT, RoBERTa, LSTM, GPT-2) on the Draper VDISC dataset, achieving around 0.87 accuracy and 0.88 precision. The study demonstrates the value of aggregating layer-wise CodeBERT information to enhance vulnerability detection in code, suggesting practical potential as an early vulnerability checker in software development. The work highlights both the benefits and limitations of current LLM-based code analysis approaches and emphasizes the need for scalable, resource-aware strategies to handle real-world, large-scale codebases for proactive cybersecurity.

Abstract

Software security is crucial in any field where breaches can exploit sensitive data, and lead to financial losses. As a result, vulnerability detection becomes an essential part of the software development process. One of the key steps in maintaining software integrity is identifying vulnerabilities in the source code before deployment. A security breach like CWE-476, which stands for NULL pointer dereferences (NPD), is crucial because it can cause software crashes, unpredictable behavior, and security vulnerabilities. In this scientific era, there are several vulnerability checkers, where, previous tools often fall short in analyzing specific feature connections of the source code, which weakens the tools in real-world scenarios. In this study, we propose another novel approach using a fine-tuned Large Language Model (LLM) termed "DeLLNeuN". This model leverages the advantage of various layers to reduce both overfitting and non-linearity, enhancing its performance and reliability. Additionally, this method provides dropout and dimensionality reduction to help streamline the model, making it faster and more efficient. Our model showed 87% accuracy with 88% precision using the Draper VDISC dataset. As software becomes more complex and cyber threats continuously evolve, the need for proactive security measures will keep growing. In this particular case, the proposed model looks promising to use as an early vulnerability checker in software development.

Paper Structure

This paper contains 17 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: An illustration of null pointer dereference (NPD) vulnerabilities in source code. Red markings highlight where NULL pointers are present.
  • Figure 2: Overall methodology of the proposed approach.
  • Figure 3: Overview of CodeBERT strategies, illustrating the preprocessing steps and transformer encoder layers.
  • Figure 4: Architecture of the DeLLNeuN model, showing the integration of multiple layers on top of CodeBERT.