An Information-theoretic Security Analysis of Honeyword
Pengcheng Su, Haibo Cheng, Wenting Li, Ping Wang
TL;DR
This work delivers an information-theoretic security analysis of honeyword systems by deriving exact mathematical expressions for the strongest-attacker metrics, Flatness and Success-number, and proving their equivalence to Total Variation distance. It establishes precise formulas and asymptotics (e.g., $\epsilon_k(1)=\Theta( M/k)$ when $f(+\infty)=0$) and provides a rigorous link between honeyword security and classical information theory. The authors develop fully polynomial-time approximation schemes to compute TV distances between password probability models (PCFGs and higher-order Markov) and evaluate sample complexity on real-world datasets, showing that reducing TV below 0.1 typically requires training data on the order of millions to tens of millions of samples. Collectively, these results yield actionable guidance for honeyword generation (e.g., uniform vs. PPM-based honeywords) and offer theoretical benchmarks and practical algorithms for assessing and improving honeyword security in practice.
Abstract
Honeyword is a representative "honey" technique that employs decoy objects to mislead adversaries and protect the real ones. To assess the security of a Honeyword system, two metrics--flatness and success-number--have been proposed and evaluated using various simulated attackers. Existing evaluations typically apply statistical learning methods to distinguish real passwords from decoys on real-world datasets. However, such evaluations may overestimate the system's security, as more effective distinguishing attacks could potentially exist. In this paper, we aim to analyze the security of Honeyword systems under the strongest theoretical attack, rather than relying on specific, expert-crafted attacks evaluated in prior experimental studies. We first derive mathematical expressions for the flatness and success-number under the strongest attack. We conduct analyses and computations for several typical scenarios, and determine the security of honeyword generation methods using a uniform distribution and the List model as examples. We further evaluate the security of existing honeyword generation methods based on password probability models (PPMs), which depends on the sample size used for training. We investigate, for the first time, the sample complexity of several representative PPMs, introducing two novel polynomial-time approximation schemes for computing the total variation between PCFG models and between higher-order Markov models. Our experimental results show that for small-scale password distributions, sample sizes on the order of millions--often tens of millions--are required to reduce the total variation below 0.1. A surprising result is that we establish an equivalence between flatness and total variation, thus bridging the theoretical study of Honeyword systems with classical information theory. Finally, we discuss the practical implications of our findings.
