Table of Contents
Fetching ...

Mens Sana In Corpore Sano: Sound Firmware Corpora for Vulnerability Research

René Helmke, Elmar Padilla, Nils Aschenbruck

TL;DR

A comprehensive analysis confirms that there is currently no common ground in related work and derives guidelines that help researchers to nurture corpus replicability and representativeness and builds a new, replicable corpus for large-scale analyses on Linux firmware: LFwC.

Abstract

Firmware corpora for vulnerability research should be scientifically sound. Yet, several practical challenges complicate the creation of sound corpora: Sample acquisition, e.g., is hard and one must overcome the barrier of proprietary or encrypted data. As image contents are unknown prior analysis, it is hard to select high-quality samples that can satisfy scientific demands. Ideally, we help each other out by sharing data. But here, sharing is problematic due to copyright laws. Instead, papers must carefully document each step of corpus creation: If a step is unclear, replicability is jeopardized. This has cascading effects on result verifiability, representativeness, and, thus, soundness. Despite all challenges, how can we maintain the soundness of firmware corpora? This paper thoroughly analyzes the problem space and investigates its impact on research: We distill practical binary analysis challenges that significantly influence corpus creation. We use these insights to derive guidelines that help researchers to nurture corpus replicability and representativeness. We apply them to 44 top tier papers and systematically analyze scientific corpus creation practices. Our comprehensive analysis confirms that there is currently no common ground in related work. It shows the added value of our guidelines, as they discover methodical issues in corpus creation and unveil miniscule step stones in documentation. These blur visions on representativeness, hinder replicability, and, thus, negatively impact the soundness of otherwise excellent work. Finally, we show the feasibility of our guidelines and build a new, replicable corpus for large-scale analyses on Linux firmware: LFwC. We share rich meta data for good (and proven) replicability. We verify unpacking, deduplicate, identify contents, provide ground truth, and show LFwC's utility for research.

Mens Sana In Corpore Sano: Sound Firmware Corpora for Vulnerability Research

TL;DR

A comprehensive analysis confirms that there is currently no common ground in related work and derives guidelines that help researchers to nurture corpus replicability and representativeness and builds a new, replicable corpus for large-scale analyses on Linux firmware: LFwC.

Abstract

Firmware corpora for vulnerability research should be scientifically sound. Yet, several practical challenges complicate the creation of sound corpora: Sample acquisition, e.g., is hard and one must overcome the barrier of proprietary or encrypted data. As image contents are unknown prior analysis, it is hard to select high-quality samples that can satisfy scientific demands. Ideally, we help each other out by sharing data. But here, sharing is problematic due to copyright laws. Instead, papers must carefully document each step of corpus creation: If a step is unclear, replicability is jeopardized. This has cascading effects on result verifiability, representativeness, and, thus, soundness. Despite all challenges, how can we maintain the soundness of firmware corpora? This paper thoroughly analyzes the problem space and investigates its impact on research: We distill practical binary analysis challenges that significantly influence corpus creation. We use these insights to derive guidelines that help researchers to nurture corpus replicability and representativeness. We apply them to 44 top tier papers and systematically analyze scientific corpus creation practices. Our comprehensive analysis confirms that there is currently no common ground in related work. It shows the added value of our guidelines, as they discover methodical issues in corpus creation and unveil miniscule step stones in documentation. These blur visions on representativeness, hinder replicability, and, thus, negatively impact the soundness of otherwise excellent work. Finally, we show the feasibility of our guidelines and build a new, replicable corpus for large-scale analyses on Linux firmware: LFwC. We share rich meta data for good (and proven) replicability. We verify unpacking, deduplicate, identify contents, provide ground truth, and show LFwC's utility for research.
Paper Structure (25 sections, 12 figures, 5 tables)

This paper contains 25 sections, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Firmware analysis challenges that can have an impact on scientific firmware corpus construction. We distill common challenges from the surveys of Wright et al. challenges_firmware_rehosting and Qasem et al. qasem_vulnerability_detection_survey and group them into eight descriptive classes. Items from the source surveys that show no clear impact on corpus creation are in grey: Accuracy, Test Case Generation, and Efficiency. We mark General and Method-Specific challenges.
  • Figure 2: A framework of guidelines for the creation of scientifically sound firmware corpora. It consists of three layers: On top are the superordinate scientific goals Replicability, Representativeness, and Method-Orientation. They are associated with the identified problems in corpus creation from \ref{['sec:challenges']}. To achieve these goals, a corpus must fulfill six requirements, which is the second layer: Ground Truth, Relevance, Clean Data, Rich Meta Data, Documentation, and Heterogeneity & Diversity. In layer three, we add a list of 16 unique and practical measures designated by an asterisk (*). They help to assess the fulfillment of the previously mentioned requirements. Note that measures can serve multiple requirements. Abstract measures are written in cursive. We do not claim list completeness, as varying paper scenarios and method constraints may imply additional or substitute measures.
  • Figure 3: Out of an initial set of 263 peer-reviewed papers from the past ten years, we distilled 44 relevant ones. For each of the remaining papers, we collected data on our 16 corpus measures, which shall support the goals of Replicability and Representativeness (cf. \ref{['sec:reqs']}).
  • Figure 4: Aggregated results of all collected data points for each measure in \ref{['tab:literature_results']}. Data points that mark non-applicability of a measure are considered.
  • Figure 5: Aggregates the results of all collected data points for the associated measures in \ref{['tab:literature_results']}. The associated measures are unweighted within a requirement, but weighted across requirements, because they can contribute towards multiple goals. All 44 papers are included and data points that mark non-applicability of a measure are considered.
  • ...and 7 more figures