Table of Contents
Fetching ...

CveBinarySheet: A Comprehensive Pre-built Binaries Database for IoT Vulnerability Analysis

Lingfeng Chen

TL;DR

The paper addresses the lack of comprehensive, multi-architecture vulnerability datasets for Binary Static Code Analysis (BSCA) in IoT and firmware contexts. It introduces CveBinarySheet, a database of 1033 CVEs with pre-built binaries across five architectures (x86-64, i386, MIPS, ARMv7, RISC-V64) and two optimization levels (O0, O3), complemented by rich CVE metadata and reproducible compilation scripts built on Arch Linux AUR. The dataset adopts a hierarchical data model for binaries and scripts, enabling robust binary similarity analysis, vulnerability matching, and training data for large language model–based vulnerability repair. By facilitating realistic BSCA benchmarking and tool development, CveBinarySheet promises tangible improvements in IoT firmware security, vulnerability localization, and remediation workflows.

Abstract

Binary Static Code Analysis (BSCA) is a pivotal area in software vulnerability research, focusing on the precise localization of vulnerabilities within binary executables. Despite advancements in BSCA techniques, there is a notable scarcity of comprehensive and readily usable vulnerability datasets tailored for diverse environments such as IoT, UEFI, and MCU firmware. To address this gap, we present CveBinarySheet, a meticulously curated database containing 1033 CVE entries spanning from 1999 to 2024. Our dataset encompasses 16 essential third-party components, including busybox and curl, and supports five CPU architectures: x86-64, i386, MIPS, ARMv7, and RISC-V64. Each precompiled binary is available at two compiler optimization levels (O0 and O3), facilitating comprehensive vulnerability analysis under different compilation scenarios. By providing detailed metadata and diverse binary samples, CveBinarySheet aims to accelerate the development of state-of-the-art BSCA tools, binary similarity analysis, and vulnerability matching applications.

CveBinarySheet: A Comprehensive Pre-built Binaries Database for IoT Vulnerability Analysis

TL;DR

The paper addresses the lack of comprehensive, multi-architecture vulnerability datasets for Binary Static Code Analysis (BSCA) in IoT and firmware contexts. It introduces CveBinarySheet, a database of 1033 CVEs with pre-built binaries across five architectures (x86-64, i386, MIPS, ARMv7, RISC-V64) and two optimization levels (O0, O3), complemented by rich CVE metadata and reproducible compilation scripts built on Arch Linux AUR. The dataset adopts a hierarchical data model for binaries and scripts, enabling robust binary similarity analysis, vulnerability matching, and training data for large language model–based vulnerability repair. By facilitating realistic BSCA benchmarking and tool development, CveBinarySheet promises tangible improvements in IoT firmware security, vulnerability localization, and remediation workflows.

Abstract

Binary Static Code Analysis (BSCA) is a pivotal area in software vulnerability research, focusing on the precise localization of vulnerabilities within binary executables. Despite advancements in BSCA techniques, there is a notable scarcity of comprehensive and readily usable vulnerability datasets tailored for diverse environments such as IoT, UEFI, and MCU firmware. To address this gap, we present CveBinarySheet, a meticulously curated database containing 1033 CVE entries spanning from 1999 to 2024. Our dataset encompasses 16 essential third-party components, including busybox and curl, and supports five CPU architectures: x86-64, i386, MIPS, ARMv7, and RISC-V64. Each precompiled binary is available at two compiler optimization levels (O0 and O3), facilitating comprehensive vulnerability analysis under different compilation scenarios. By providing detailed metadata and diverse binary samples, CveBinarySheet aims to accelerate the development of state-of-the-art BSCA tools, binary similarity analysis, and vulnerability matching applications.
Paper Structure (17 sections)