SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

Yangde Wang; Weidong Qiu; Peng Tang; Hao Tian; Shujun Li

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

Yangde Wang, Weidong Qiu, Peng Tang, Hao Tian, Shujun Li

TL;DR

This work tackles the gap in understanding semantic patterns in user-generated passwords across languages by introducing SE#PCFG, a semantically enhanced PCFG framework with 43 semantic factor types and multilingual coverage. It formalizes a four-level password model (Characters, SFs/SFTs, SPs, and Semantic Structure) and a streamlined pipeline for semantic analysis, including novel smoothing to handle unobserved patterns. Building on this, SEPCA is proposed as a semantically aware password-cracking architecture that outperforms three state-of-the-art baselines across 52 test cases, with significant improvements in user- and password-level coverage. The results yield new insights into cross-database semantic correlations and have practical implications for password policies, with robust methods for analyzing and auditing password security in multilingual settings.

Abstract

Much research has been done on user-generated textual passwords. Surprisingly, semantic information in such passwords remain under-investigated, with passwords created by English- and/or Chinese-speaking users being more studied with limited semantics. This paper fills this gap by proposing a general framework based on semantically enhanced PCFG (probabilistic context-free grammars) named SE#PCFG. It allowed us to consider 43 types of semantic information, the richest set considered so far, for password analysis. Applying SE#PCFG to 17 large leaked password databases of user speaking four languages (English, Chinese, German and French), we demonstrate its usefulness and report a wide range of new insights about password semantics at different levels such as cross-website password correlations. Furthermore, based on SE#PCFG and a new systematic smoothing method, we proposed the Semantically Enhanced Password Cracking Architecture (SEPCA), and compared its performance against three SOTA (state-of-the-art) benchmarks in terms of the password coverage rate: two other PCFG variants and neural network. Our experimental results showed that SEPCA outperformed all the three benchmarks consistently and significantly across 52 test cases, by up to 21.53%, 52.55% and 7.86%, respectively, at the user-level (with duplicate passwords). At the level of unique passwords, SEPCA also beats the three counterparts by up to 43.83%, 94.11% and 11.16%, respectively.

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

TL;DR

Abstract

Paper Structure (26 sections, 4 equations, 6 figures, 10 tables)

This paper contains 26 sections, 4 equations, 6 figures, 10 tables.

Introduction
Related Work and Comparison
Related Work
Password modeling methods
Password semantic analysis
Comparison With Related Work
SE#PCFG and Password Semantic Analysis
Conceptual Model of SE#PCFG
Four Structural Levels
SFTs and SFs
A Streamlined Computational Process
Step 1 -- Pre-processing
Step 2a -- Identifying SFs in L-Segments
Step 2b -- identifying SFs in D- and S-Segments
Step 3 -- Post-processing
...and 11 more sections

Figures (6)

Figure 1: Distribution of combined SFTs in the 17 databases. We can see a clear vision that English, German and French databases have similar distribution at SFT-level except for 10 (MyHeritage). Meanwhile, Chinese databases have similar distribution with each other, but quite different from the other databases. All numbers labeled in each figure are on average.
Figure 2: Distributions of SPL in the 17 databases
Figure 3: Cross-database semantic correlation values at the SFT level and those at the combined SF-SFT level, according to Han-DM-book2012 and Eq. \ref{['eq:cos_similarity_SF-SFT']}, respectively. The x- and y-axis show the indices of the 17 databases shown in Table \ref{['tab:PasswordDatabases']}.
Figure 4: Performance using Monte-Carlo (MC) estimation and real-attacks (RA).
Figure 5: Performance comparison between SEPCA and DPG over all testing sets. SEPCA, DPG Pasquini-SP-2021.
...and 1 more figures

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

TL;DR

Abstract

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

Authors

TL;DR

Abstract

Table of Contents

Figures (6)