The Needle is a Thread: Finding Planted Paths in Noisy Process Trees
Maya Le, Paweł Prałat, Aaron Smith, François Théberge
TL;DR
The paper introduces the planted path problem in noisy labelled trees and proposes a polynomial-time fuzzy matching algorithm based on bottom-up dynamic programming to find high-scoring partial matchings between two trees. It formalizes a feature-based similarity s(𝒢,𝒟) and demonstrates how the matching results can serve as building blocks for unsupervised template discovery and classifier-enhancement workflows. Through synthetic planted-path models and experiments on the real ACME4 dataset, the authors show that planted-path signals can be recovered or amplified even in noisy, bushy trees, and that matches can be leveraged for downstream tasks such as clustering and threat detection. The work provides practical tools for extracting meaningful sequences in cybersecurity logs and related domains, with implications for signal aggregation and scalable analysis of large, noisy process trees.
Abstract
Motivated by applications in cybersecurity such as finding meaningful sequences of malware-related events buried inside large amounts of computer log data, we introduce the "planted path" problem and propose an algorithm to find fuzzy matchings between two trees. This algorithm can be used as a "building block" for more complicated workflows. We demonstrate usefulness of a few of such workflows in mining synthetically generated data as well as real-world ACME cybersecurity datasets.
