Table of Contents
Fetching ...

Toward Automated Security Risk Detection in Large Software Using Call Graph Analysis

Nicholas Pecka, Lotfi Ben Othmane, Renee Bryce

TL;DR

This work tackles the challenge of maintaining up-to-date, system-wide threat models in large cloud-native software by automating threat modeling through call graph analysis. By applying clustering algorithms (HDBSCAN and Leiden) to large call graphs, the authors identify dense and modular structures that signal potential security risks and surface actionable regions for further analysis. They evaluate four clustering methods on the Splunk Forwarder Operator (SFO) and develop CWE-based heuristics to map cluster characteristics to threat indicators, demonstrating that HDBSCAN and Leiden provide meaningful, scalable results with actionable insights such as hotspot and hub behaviors and notable nodes like (reflect.Value).Call. The study lays the groundwork for a semi-automated threat modeling pipeline suitable for evolving production systems, enabling more timely and scalable risk assessment in cloud-native environments.

Abstract

Threat modeling plays a critical role in the identification and mitigation of security risks; however, manual approaches are often labor intensive and prone to error. This paper investigates the automation of software threat modeling through the clustering of call graphs using density-based and community detection algorithms, followed by an analysis of the threats associated with the identified clusters. The proposed method was evaluated through a case study of the Splunk Forwarder Operator (SFO), wherein selected clustering metrics were applied to the software's call graph to assess pertinent code-density security weaknesses. The results demonstrate the viability of the approach and underscore its potential to facilitate systematic threat assessment. This work contributes to the advancement of scalable, semi-automated threat modeling frameworks tailored for modern cloud-native environments.

Toward Automated Security Risk Detection in Large Software Using Call Graph Analysis

TL;DR

This work tackles the challenge of maintaining up-to-date, system-wide threat models in large cloud-native software by automating threat modeling through call graph analysis. By applying clustering algorithms (HDBSCAN and Leiden) to large call graphs, the authors identify dense and modular structures that signal potential security risks and surface actionable regions for further analysis. They evaluate four clustering methods on the Splunk Forwarder Operator (SFO) and develop CWE-based heuristics to map cluster characteristics to threat indicators, demonstrating that HDBSCAN and Leiden provide meaningful, scalable results with actionable insights such as hotspot and hub behaviors and notable nodes like (reflect.Value).Call. The study lays the groundwork for a semi-automated threat modeling pipeline suitable for evolving production systems, enabling more timely and scalable risk assessment in cloud-native environments.

Abstract

Threat modeling plays a critical role in the identification and mitigation of security risks; however, manual approaches are often labor intensive and prone to error. This paper investigates the automation of software threat modeling through the clustering of call graphs using density-based and community detection algorithms, followed by an analysis of the threats associated with the identified clusters. The proposed method was evaluated through a case study of the Splunk Forwarder Operator (SFO), wherein selected clustering metrics were applied to the software's call graph to assess pertinent code-density security weaknesses. The results demonstrate the viability of the approach and underscore its potential to facilitate systematic threat assessment. This work contributes to the advancement of scalable, semi-automated threat modeling frameworks tailored for modern cloud-native environments.

Paper Structure

This paper contains 13 sections, 4 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Comparison of top 10 clusters produced by HDBSCAN and Leiden algorithms for SFO.