A Static Analysis of Popular C Packages in Linux
Jukka Ruohonen, Mubashrah Saddiqa, Krzysztof Sierszecki
TL;DR
The paper applies GCC's compile-time static analyzer to a large set of C-based Gentoo packages to empirically assess software quality and security weaknesses mapped to CWEs. Using descriptive statistics and nonparametric tests, it finds a highly skewed distribution of CWE-mapped warnings, with 89% of packages emitting none and a few packages contributing the majority of warnings; uninitialized variables (CWE-457) and NULL pointer dereferences (CWE-476/690) dominate. The work highlights practical implications for coding practices and tool development, while acknowledging limitations such as false positives, kernel exclusion, and generalizability beyond Gentoo. It also sketches avenues for future work, including benchmarks, dashboards, and integration with vulnerability ecosystems to better prioritize and triage static-analysis findings.
Abstract
Static analysis is a classical technique for improving software security and software quality in general. Fairly recently, a new static analyzer was implemented in the GNU Compiler Collection (GCC). The present paper uses the GCC's analyzer to empirically examine popular Linux packages. The dataset used is based on those packages in the Gentoo Linux distribution that are either written in C or contain C code. In total, 3,538 such packages are covered. According to the results, uninitialized variables and NULL pointer dereference issues are the most common problems according to the analyzer. Classical memory management issues are relatively rare. The warnings also follow a long-tailed probability distribution across the packages; a few packages are highly warning-prone, whereas no warnings are present for as much as 89% of the packages. Furthermore, the warnings do not vary across different application domains. With these results, the paper contributes to the domain of large-scale empirical research on software quality and security. In addition, a discussion is presented about practical implications of the results.
