Rethinking Provenance Completeness with a Learning-Based Linux Scheduler
Jinsong Mao, Benjamin E. Ujcich, Shiqing Ma
TL;DR
The paper tackles provenance completeness under the reference monitor model by addressing the super producer threat, which can cause provenance event loss. It introduces Aegis, a kernel-space scheduler for Linux that blends a multi-queue backbone with a reinforcement-learning component (Deep Q-Network) to proactively allocate CPU resources to provenance tasks, achieving zero event loss while maintaining reasonable overhead. Implemented via eBPF and sched_ext, Aegis uses an exponential waiting-time design and a delta-based inference reduction to balance completeness, efficiency, and fairness; it is trained to minimize event loss and maximize throughput. Empirical evaluation across Sysdig and eAudite shows Aegis outperforms or matches state-of-the-art schedulers in provenance completeness and maintains competitive performance, validating its practical impact on robust, auditable computing systems.
Abstract
Provenance plays a critical role in maintaining traceability of a system's actions for root cause analysis of security threats and impacts. Provenance collection is often incorporated into the reference monitor of systems to ensure that an audit trail exists of all events, that events are completely captured, and that logging of such events cannot be bypassed. However, recent research has questioned whether existing state-of-the-art provenance collection systems fail to ensure the security guarantees of a true reference monitor due to the 'super producer threat' in which provenance generation can overload a system to force the system to drop security-relevant events and allow an attacker to hide their actions. One approach towards solving this threat is to enforce resource isolation, but that does not fully solve the problems resulting from hardware dependencies and performance limitations. In this paper, we show how an operating system's kernel scheduler can mitigate this threat, and we introduce Aegis, a learned scheduler for Linux specifically designed for provenance. Unlike conventional schedulers that ignore provenance completeness requirements, Aegis leverages reinforcement learning to learn provenance task behavior and to dynamically optimize resource allocation. We evaluate Aegis's efficacy and show that Aegis significantly improves both the completeness and efficiency of provenance collection systems compared to traditional scheduling, while maintaining reasonable overheads and even improving overall runtime in certain cases compared to the default Linux scheduler.
