An Information-theoretic Security Analysis of Honeyword

Pengcheng Su; Haibo Cheng; Wenting Li; Ping Wang

An Information-theoretic Security Analysis of Honeyword

Pengcheng Su, Haibo Cheng, Wenting Li, Ping Wang

TL;DR

This work delivers an information-theoretic security analysis of honeyword systems by deriving exact mathematical expressions for the strongest-attacker metrics, Flatness and Success-number, and proving their equivalence to Total Variation distance. It establishes precise formulas and asymptotics (e.g., $\epsilon_k(1)=\Theta( M/k)$ when $f(+\infty)=0$) and provides a rigorous link between honeyword security and classical information theory. The authors develop fully polynomial-time approximation schemes to compute TV distances between password probability models (PCFGs and higher-order Markov) and evaluate sample complexity on real-world datasets, showing that reducing TV below 0.1 typically requires training data on the order of millions to tens of millions of samples. Collectively, these results yield actionable guidance for honeyword generation (e.g., uniform vs. PPM-based honeywords) and offer theoretical benchmarks and practical algorithms for assessing and improving honeyword security in practice.

Abstract

Honeyword is a representative "honey" technique that employs decoy objects to mislead adversaries and protect the real ones. To assess the security of a Honeyword system, two metrics--flatness and success-number--have been proposed and evaluated using various simulated attackers. Existing evaluations typically apply statistical learning methods to distinguish real passwords from decoys on real-world datasets. However, such evaluations may overestimate the system's security, as more effective distinguishing attacks could potentially exist. In this paper, we aim to analyze the security of Honeyword systems under the strongest theoretical attack, rather than relying on specific, expert-crafted attacks evaluated in prior experimental studies. We first derive mathematical expressions for the flatness and success-number under the strongest attack. We conduct analyses and computations for several typical scenarios, and determine the security of honeyword generation methods using a uniform distribution and the List model as examples. We further evaluate the security of existing honeyword generation methods based on password probability models (PPMs), which depends on the sample size used for training. We investigate, for the first time, the sample complexity of several representative PPMs, introducing two novel polynomial-time approximation schemes for computing the total variation between PCFG models and between higher-order Markov models. Our experimental results show that for small-scale password distributions, sample sizes on the order of millions--often tens of millions--are required to reduce the total variation below 0.1. A surprising result is that we establish an equivalence between flatness and total variation, thus bridging the theoretical study of Honeyword systems with classical information theory. Finally, we discuss the practical implications of our findings.

An Information-theoretic Security Analysis of Honeyword

TL;DR

when

) and provides a rigorous link between honeyword security and classical information theory. The authors develop fully polynomial-time approximation schemes to compute TV distances between password probability models (PCFGs and higher-order Markov) and evaluate sample complexity on real-world datasets, showing that reducing TV below 0.1 typically requires training data on the order of millions to tens of millions of samples. Collectively, these results yield actionable guidance for honeyword generation (e.g., uniform vs. PPM-based honeywords) and offer theoretical benchmarks and practical algorithms for assessing and improving honeyword security in practice.

Abstract

Paper Structure (40 sections, 10 theorems, 73 equations, 15 figures, 13 tables, 5 algorithms)

This paper contains 40 sections, 10 theorems, 73 equations, 15 figures, 13 tables, 5 algorithms.

Introduction
Background
Detection Strategy
Honeyword Generation Techniques
Existing Theoretic Results
Theoretical Calculation of Flatness
Definition of Flatness Function
Continuous Case
Discrete Case
Further Discussion
Theoretical Calculation of Success-number
Definition
Theoretical Calculation
The equivalence between Flatness and Total Variance
Algorithm for Total Variation of PCFG Models and Markov Models
...and 25 more sections

Key Result

Theorem 1

When $f(+\infty)=0$, the flatness function in the Definition def1 in the continuous case has the formula:

Figures (15)

Figure 1: The flatness function in Example \ref{['exmp1']}
Figure 2: $\epsilon_k(1)$-$k$: Zipf password vs. uniform honeywords
Figure 3: Flatness function ($k=20$): Zipf password vs. uniform honeywords
Figure : List
Figure : List
...and 10 more figures

Theorems & Definitions (23)

Definition 1
Theorem 1
proof
Theorem 2
proof
Example 3.1
Remark 1
Theorem 3
Example 3.2
Definition 2
...and 13 more

An Information-theoretic Security Analysis of Honeyword

TL;DR

Abstract

An Information-theoretic Security Analysis of Honeyword

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (23)