Table of Contents
Fetching ...

AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts

Setu Kumar Basak, K. Virgil English, Ken Ogura, Vitesh Kambara, Bradley Reaves, Laurie Williams

TL;DR

The findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0 % false positives and aids in improving the recall of secret detection tools.

Abstract

GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90%), and F1-score (94%) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving recall of secret detection tools.

AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts

TL;DR

The findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0 % false positives and aids in improving the recall of secret detection tools.

Abstract

GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%-99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90%), and F1-score (94%) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0% false positives and aids in improving recall of secret detection tools.
Paper Structure (11 sections, 8 figures, 6 tables)

This paper contains 11 sections, 8 figures, 6 tables.

Figures (8)

  • Figure 1: A secret can protect both real and non-sensitive assets, such as a public IP address (real) and localhost (non-sensitive).
  • Figure 2: The database asset identifier has three parts (host, port, and database name) that are defined separately in the same line, in separate lines, or together in the same string.
  • Figure 3: We identified four types of secret-asset co-location patterns in the source code.
  • Figure 4: The database credentials and server address are passed in a specific order in the database driver function.
  • Figure 5: The database credentials and server address are passed as keyword arguments in the database driver functions.
  • ...and 3 more figures