Table of Contents
Fetching ...

Exploiting ftrace's function_graph Tracer Features for Machine Learning: A Case Study on Encryption Detection

Kenan Begovic, Abdulaziz Al-Ali, Qutaibah Malluhi

TL;DR

This work explores leveraging the Linux kernel ftrace function_graph tracer to generate rich, graph-based features for machine learning aimed at encryption detection and application identification. By constructing a dataset from thousands of traces and extracting centrality- and duration-based metrics, the authors demonstrate high classification accuracy, notably 99.28% in binary encryption detection and strong multi-label performance. The study includes feature selection, ablation, and robustness analyses, highlighting the pivotal role of graph-based features in capturing kernel-call dependencies. The findings establish a bridge between system-level tracing and ML, with implications for security analytics, anomaly detection, and real-time monitoring, while outlining challenges in scalability and real-time deployment.

Abstract

This paper proposes using the Linux kernel ftrace framework, particularly the function graph tracer, to generate informative system level data for machine learning (ML) applications. Experiments on a real world encryption detection task demonstrate the efficacy of the proposed features across several learning algorithms. The learner faces the problem of detecting encryption activities across a large dataset of files, using function call traces and graph based features. Empirical results highlight an outstanding accuracy of 99.28 on the task at hand, underscoring the efficacy of features derived from the function graph tracer. The results were further validated in an additional experiment targeting a multilabel classification problem, in which running programs were identified from trace data. This work provides comprehensive methodologies for preprocessing raw trace data and extracting graph based features, offering significant advancements in applying ML to system behavior analysis, program identification, and anomaly detection. By bridging the gap between system tracing and ML, this paper paves the way for innovative solutions in performance monitoring and security analytics.

Exploiting ftrace's function_graph Tracer Features for Machine Learning: A Case Study on Encryption Detection

TL;DR

This work explores leveraging the Linux kernel ftrace function_graph tracer to generate rich, graph-based features for machine learning aimed at encryption detection and application identification. By constructing a dataset from thousands of traces and extracting centrality- and duration-based metrics, the authors demonstrate high classification accuracy, notably 99.28% in binary encryption detection and strong multi-label performance. The study includes feature selection, ablation, and robustness analyses, highlighting the pivotal role of graph-based features in capturing kernel-call dependencies. The findings establish a bridge between system-level tracing and ML, with implications for security analytics, anomaly detection, and real-time monitoring, while outlining challenges in scalability and real-time deployment.

Abstract

This paper proposes using the Linux kernel ftrace framework, particularly the function graph tracer, to generate informative system level data for machine learning (ML) applications. Experiments on a real world encryption detection task demonstrate the efficacy of the proposed features across several learning algorithms. The learner faces the problem of detecting encryption activities across a large dataset of files, using function call traces and graph based features. Empirical results highlight an outstanding accuracy of 99.28 on the task at hand, underscoring the efficacy of features derived from the function graph tracer. The results were further validated in an additional experiment targeting a multilabel classification problem, in which running programs were identified from trace data. This work provides comprehensive methodologies for preprocessing raw trace data and extracting graph based features, offering significant advancements in applying ML to system behavior analysis, program identification, and anomaly detection. By bridging the gap between system tracing and ML, this paper paves the way for innovative solutions in performance monitoring and security analytics.

Paper Structure

This paper contains 32 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Process Flow of Experiment 1 (Encryption Detection)
  • Figure 2: Chi-Squared Scores of Features
  • Figure 3: Experiment 1 results using Learning Curve analysis
  • Figure 4: Performance Comparison of Algorithms for Multi-Label Classification