A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different Languages

Gábor Antal; Balázs Mosolygó; Norbert Vándor; Péter Hegedüs

A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different Languages

Gábor Antal, Balázs Mosolygó, Norbert Vándor, Péter Hegedüs

TL;DR

This study addresses how security vulnerability types and their remediation differ across programming languages by mining CVE/CWE signals from commit logs in nine languages. A data-mining pipeline built from CVE Manager, Git Log Parser, and CVE Miner collects CVEs mentioned in commits, estimates fix times, tracks contributor activity, and measures code changes to produce cross-language vulnerability statistics. Key findings reveal language-dependent CWE distributions and remediation patterns, such as CWE-119 in C++ and CWE-79 in Ruby, with larger projects often showing longer fix cycles and CVE reoccurrence, highlighting how ecosystem and scale shape security practices. The work demonstrates a scalable method to quantify cross-language security activity and provides a reference framework for researchers and developers, while acknowledging limitations due to sample size and reliance on commit-message indicators.

Abstract

The number of people accessing online services is increasing day by day, and with new users, comes a greater need for effective and responsive cyber-security. Our goal in this study was to find out if there are common patterns within the most widely used programming languages in terms of security issues and fixes. In this paper, we showcase some statistics based on the data we extracted for these languages. Analyzing the more popular ones, we found that the same security issues might appear differently in different languages, and as such the provided solutions may vary just as much. We also found that projects with similar sizes can produce extremely different results, and have different common weaknesses, even if they provide a solution to the same task. These statistics may not be entirely indicative of the projects' standards when it comes to security, but they provide a good reference point of what one should expect. Given a larger sample size they could be made even more precise, and as such a better understanding of the security relevant activities within the projects written in given languages could be achieved.

A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different Languages

TL;DR

Abstract

Paper Structure (19 sections, 6 figures, 2 tables)

This paper contains 19 sections, 6 figures, 2 tables.

Introduction
Approach
CVE Manager
Git Log Parser
CVE Miner
Approach Summary
Results
Time Based Statistics
Time elapsed between the finding and fixing commit.
Time elapsed between the publication and fixing of a CVE.
Correlation between time and severity.
Activity Based Statistics
Active contributors and commit count during the fixing of a CVE
Average File and Line Changes
Most Common CWEs by Language
...and 4 more sections

Figures (6)

Figure 1: A schematic representation of our miner
Figure 2: The average time elapsed in days between finding and fixing a CVE
Figure 3: The average time elapsed between the publication and fixing of a cve represanted in days
Figure 4: The correlation between the base score(severity) and time taken fixing the cve
Figure 5: The average number of contributors between the finding and fixing commit
...and 1 more figures

A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different Languages

TL;DR

Abstract

A Data-Mining Based Study of Security Vulnerability Types and Their Mitigation in Different Languages

Authors

TL;DR

Abstract

Table of Contents

Figures (6)