Table of Contents
Fetching ...

Rethinking Provenance Completeness with a Learning-Based Linux Scheduler

Jinsong Mao, Benjamin E. Ujcich, Shiqing Ma

TL;DR

The paper tackles provenance completeness under the reference monitor model by addressing the super producer threat, which can cause provenance event loss. It introduces Aegis, a kernel-space scheduler for Linux that blends a multi-queue backbone with a reinforcement-learning component (Deep Q-Network) to proactively allocate CPU resources to provenance tasks, achieving zero event loss while maintaining reasonable overhead. Implemented via eBPF and sched_ext, Aegis uses an exponential waiting-time design and a delta-based inference reduction to balance completeness, efficiency, and fairness; it is trained to minimize event loss and maximize throughput. Empirical evaluation across Sysdig and eAudite shows Aegis outperforms or matches state-of-the-art schedulers in provenance completeness and maintains competitive performance, validating its practical impact on robust, auditable computing systems.

Abstract

Provenance plays a critical role in maintaining traceability of a system's actions for root cause analysis of security threats and impacts. Provenance collection is often incorporated into the reference monitor of systems to ensure that an audit trail exists of all events, that events are completely captured, and that logging of such events cannot be bypassed. However, recent research has questioned whether existing state-of-the-art provenance collection systems fail to ensure the security guarantees of a true reference monitor due to the 'super producer threat' in which provenance generation can overload a system to force the system to drop security-relevant events and allow an attacker to hide their actions. One approach towards solving this threat is to enforce resource isolation, but that does not fully solve the problems resulting from hardware dependencies and performance limitations. In this paper, we show how an operating system's kernel scheduler can mitigate this threat, and we introduce Aegis, a learned scheduler for Linux specifically designed for provenance. Unlike conventional schedulers that ignore provenance completeness requirements, Aegis leverages reinforcement learning to learn provenance task behavior and to dynamically optimize resource allocation. We evaluate Aegis's efficacy and show that Aegis significantly improves both the completeness and efficiency of provenance collection systems compared to traditional scheduling, while maintaining reasonable overheads and even improving overall runtime in certain cases compared to the default Linux scheduler.

Rethinking Provenance Completeness with a Learning-Based Linux Scheduler

TL;DR

The paper tackles provenance completeness under the reference monitor model by addressing the super producer threat, which can cause provenance event loss. It introduces Aegis, a kernel-space scheduler for Linux that blends a multi-queue backbone with a reinforcement-learning component (Deep Q-Network) to proactively allocate CPU resources to provenance tasks, achieving zero event loss while maintaining reasonable overhead. Implemented via eBPF and sched_ext, Aegis uses an exponential waiting-time design and a delta-based inference reduction to balance completeness, efficiency, and fairness; it is trained to minimize event loss and maximize throughput. Empirical evaluation across Sysdig and eAudite shows Aegis outperforms or matches state-of-the-art schedulers in provenance completeness and maintains competitive performance, validating its practical impact on robust, auditable computing systems.

Abstract

Provenance plays a critical role in maintaining traceability of a system's actions for root cause analysis of security threats and impacts. Provenance collection is often incorporated into the reference monitor of systems to ensure that an audit trail exists of all events, that events are completely captured, and that logging of such events cannot be bypassed. However, recent research has questioned whether existing state-of-the-art provenance collection systems fail to ensure the security guarantees of a true reference monitor due to the 'super producer threat' in which provenance generation can overload a system to force the system to drop security-relevant events and allow an attacker to hide their actions. One approach towards solving this threat is to enforce resource isolation, but that does not fully solve the problems resulting from hardware dependencies and performance limitations. In this paper, we show how an operating system's kernel scheduler can mitigate this threat, and we introduce Aegis, a learned scheduler for Linux specifically designed for provenance. Unlike conventional schedulers that ignore provenance completeness requirements, Aegis leverages reinforcement learning to learn provenance task behavior and to dynamically optimize resource allocation. We evaluate Aegis's efficacy and show that Aegis significantly improves both the completeness and efficiency of provenance collection systems compared to traditional scheduling, while maintaining reasonable overheads and even improving overall runtime in certain cases compared to the default Linux scheduler.

Paper Structure

This paper contains 24 sections, 12 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Scheduling statistics and causal graphs of the Sysdig and EEVDF "super producer" nodrop case study example.
  • Figure 2: NoDrop needs to perform synchronization in critical regions, causing kernel bugs, deadlocks and potential kernel panic freeze1panic1panic2.
  • Figure 3: Overview of Aegis, using task and provenance features as input to ensure performant workloads with no event loss.
  • Figure 4: eBPF programs are subject to verification before being loaded into the kernel.
  • Figure 5: User-space and kernel-space Aegis implementation components.
  • ...and 4 more figures