Customizing Static Analysis using Codesearch
Avi Hayoun, Veselin Raychev, Jack Hair
TL;DR
The paper addresses the challenge of building customizable static analyses via an accessible yet efficient language. It introduces StarLang, a fast, monadic Datalog subset with stratified negation and unary predicates, designed to enable real-time rule authoring for static analysis. A practical frontend, Codesearch, provides a lightweight interface and standard library to express complex analyses (taint, dataflow, typestate, points-to) with templates and autocompletion. Through case studies across tainting, dataflow, typestate, and linting, the approach demonstrates expressive power and real-time feedback, offering a practical path for integrating semantic analyses into large code repositories. Overall, StarLang and Codesearch deliver a user-friendly DSL that supports efficient, scalable static-analysis queries with meaningful real-world applicability.
Abstract
Static analysis is a growing application of software engineering, leading to a range of essential security tools, bug-finding tools, as well as software verification. Recent years show an increase of universal static analysis tools that validate a range of properties and allow customizing parts of the scanner to validate additional properties or "static analysis rules". A commonly used language to describe a range of static analysis applications is Datalog. Unfortunately, the language is still non-trivial to use, leading to analysis that is difficult to implement in a precise but performant way. In this work, we aim to make building custom static analysis tools much easier for developers, while at the same time, providing a familiar framework for application security and static analysis experts. Our approach introduces a language called StarLang, a variant of Datalog which only includes programs with a fast runtime by the means of having low time complexity of its decision procedure.
