Table of Contents
Fetching ...

Implementing and Executing Static Analysis Using LLVM and CodeChecker

Gabor Horvath, Reka Kovacs, Richard Szalay, Zoltan Porkolab

TL;DR

This article presents a practical guide to implementing static analysis within the LLVM/Clang ecosystem and CodeChecker, focusing on two C++ problems: eliminating redundant pointers and detecting dangling pointers from std::string usage. It contrasts syntax tree based analysis using AST matching in Clang-Tidy with symbolic execution via the Clang Static Analyzer, providing step by step implementations, code skeletons, and instrumentation details. The paper also demonstrates how CodeChecker orchestrates multiple analyzers, supports diagnostics and automatic fixes, and shows how to extend analyses with custom data structures and inter checker communication. Together, the material offers actionable guidance for tool architects and developers to build scalable, path sensitive, and AST based static analyses in real world industrial codebases.

Abstract

Static analysis is a method of analyzing source code without executing it. It is widely used to find bugs and code smells in industrial software. Besides other methods, the most important techniques are those based on the abstract syntax tree and those performing symbolic execution. Both of these methods found their role in modern software development as they have different advantages and limitations. In this tutorial, we present two problems from the C++ programming language: the elimination of redundant pointers, and the reporting of dangling pointers originating from incorrect use of the std::string class. These two issues have different theoretical backgrounds and finding them requires different implementation techniques. We will provide a step-by-step guide to implement the checkers (software to identify the aforementioned problems) - one based on the abstract syntax analysis method, the other exploring the possibilities of symbolic execution. The methods are explained in great detail and supported by code examples. The intended audience for this tutorial are both architects of static analysis tools and developers who want to understand the advantages and constraints of the different methods.

Implementing and Executing Static Analysis Using LLVM and CodeChecker

TL;DR

This article presents a practical guide to implementing static analysis within the LLVM/Clang ecosystem and CodeChecker, focusing on two C++ problems: eliminating redundant pointers and detecting dangling pointers from std::string usage. It contrasts syntax tree based analysis using AST matching in Clang-Tidy with symbolic execution via the Clang Static Analyzer, providing step by step implementations, code skeletons, and instrumentation details. The paper also demonstrates how CodeChecker orchestrates multiple analyzers, supports diagnostics and automatic fixes, and shows how to extend analyses with custom data structures and inter checker communication. Together, the material offers actionable guidance for tool architects and developers to build scalable, path sensitive, and AST based static analyses in real world industrial codebases.

Abstract

Static analysis is a method of analyzing source code without executing it. It is widely used to find bugs and code smells in industrial software. Besides other methods, the most important techniques are those based on the abstract syntax tree and those performing symbolic execution. Both of these methods found their role in modern software development as they have different advantages and limitations. In this tutorial, we present two problems from the C++ programming language: the elimination of redundant pointers, and the reporting of dangling pointers originating from incorrect use of the std::string class. These two issues have different theoretical backgrounds and finding them requires different implementation techniques. We will provide a step-by-step guide to implement the checkers (software to identify the aforementioned problems) - one based on the abstract syntax analysis method, the other exploring the possibilities of symbolic execution. The methods are explained in great detail and supported by code examples. The intended audience for this tutorial are both architects of static analysis tools and developers who want to understand the advantages and constraints of the different methods.
Paper Structure (57 sections, 64 figures, 2 tables, 1 algorithm)

This paper contains 57 sections, 64 figures, 2 tables, 1 algorithm.

Figures (64)

  • Figure 1: The abstract syntax tree of the expression $b = a + 1$.
  • Figure 2: Steps taken by a simple compiler.
  • Figure 3: The architecture of the LLVM Compiler Infrastructure project.
  • Figure 4: Input for const-ness check
  • Figure 5: Example for division by zero.
  • ...and 59 more figures