Honeyfile Camouflage: Hiding Fake Files in Plain Sight
Roelien C. Timmer, David Liebowitz, Surya Nepal, Salil S. Kanhere
TL;DR
This paper tackles the problem of camouflaging honeyfile filenames by embedding them in semantic vector spaces and using two cosine-distance-based camouflage metrics. It introduces Simple Camouflage, based on distance to the directory mean, and Cluster Camouflage, based on a von Mises-Fisher mixture model, with performance evaluated on a GitHub filesystem dataset. The results show both metrics effectively distinguish locally sourced filenames from external samples, with Simple Camouflage offering substantially lower computational cost and comparable effectiveness. The work advances practical cyber deception by providing quantitative tools to generate believable yet stealthy honeyfile names and discusses implications for deployment environments and future testing on diverse datasets and LLM-generated content.
Abstract
Honeyfiles are a particularly useful type of honeypot: fake files deployed to detect and infer information from malicious behaviour. This paper considers the challenge of naming honeyfiles so they are camouflaged when placed amongst real files in a file system. Based on cosine distances in semantic vector spaces, we develop two metrics for filename camouflage: one based on simple averaging and one on clustering with mixture fitting. We evaluate and compare the metrics, showing that both perform well on a publicly available GitHub software repository dataset.
