Table of Contents
Fetching ...

PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer

Xingyu Su, Xiaojie Zhu, Yang Li, Yong Li, Chi Chen, Paulo Esteves-Veríssimo

TL;DR

This work addresses the challenge of high-quality, pattern-consistent password guessing under large attack budgets. It introduces PagPassGPT, a GPT-2-based generator that conditions password production on explicit pattern information, and D&C-GEN, a divide-and-conquer algorithm that reduces duplicates by partitioning the generation task into non-overlapping subtasks. Together, they achieve higher hit rates and substantially lower repeat rates than prior deep-learning approaches, with notable gains in pattern-guided guessing ($HR$ improvements) and cross-site generalization. The methods offer practical implications for evaluating password strength and understanding attack surfaces, while also highlighting limitations in pattern diversity and computational overhead that warrant further research.

Abstract

Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge, resulting in a significant increase in the hit rate. Furthermore, we propose D&C-GEN to reduce the repeat rate of generated passwords, which adopts the concept of a divide-and-conquer approach. The primary task of guessing passwords is recursively divided into non-overlapping subtasks. Each subtask inherits the knowledge from the parent task and predicts succeeding tokens. In comparison to the state-of-the-art model, our proposed scheme exhibits the capability to correctly guess 12% more passwords while producing 25% fewer duplicates.

PagPassGPT: Pattern Guided Password Guessing via Generative Pretrained Transformer

TL;DR

This work addresses the challenge of high-quality, pattern-consistent password guessing under large attack budgets. It introduces PagPassGPT, a GPT-2-based generator that conditions password production on explicit pattern information, and D&C-GEN, a divide-and-conquer algorithm that reduces duplicates by partitioning the generation task into non-overlapping subtasks. Together, they achieve higher hit rates and substantially lower repeat rates than prior deep-learning approaches, with notable gains in pattern-guided guessing ( improvements) and cross-site generalization. The methods offer practical implications for evaluating password strength and understanding attack surfaces, while also highlighting limitations in pattern diversity and computational overhead that warrant further research.

Abstract

Amidst the surge in deep learning-based password guessing models, challenges of generating high-quality passwords and reducing duplicate passwords persist. To address these challenges, we present PagPassGPT, a password guessing model constructed on Generative Pretrained Transformer (GPT). It can perform pattern guided guessing by incorporating pattern structure information as background knowledge, resulting in a significant increase in the hit rate. Furthermore, we propose D&C-GEN to reduce the repeat rate of generated passwords, which adopts the concept of a divide-and-conquer approach. The primary task of guessing passwords is recursively divided into non-overlapping subtasks. Each subtask inherits the knowledge from the parent task and predicts succeeding tokens. In comparison to the state-of-the-art model, our proposed scheme exhibits the capability to correctly guess 12% more passwords while producing 25% fewer duplicates.
Paper Structure (42 sections, 7 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 42 sections, 7 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: The process of pattern guided guessing. The pattern "L4N3S1" signifies a password comprising four letters, followed by three numbers, and ending with one special character. Pattern guided guessing refers to the process wherein a password guessing model generates passwords that adhere to such specific patterns.
  • Figure 2: The overview of the proposed solution. We utilize passwords and patterns extracted from known passwords to train PagPassGPT. Leveraging the ability of pattern guided guessing from PagPassGPT and with the assistance of D&C-GEN, PagPassGPT generates high-quality passwords for trawling attacks.
  • Figure 3: The training process (left) and the generation process (right) of PagPassGPT. The numbers in the figure correspond to the indexes after encoding, as presented in Fig. \ref{['fig:encode and decode']}. Instances, where the number is shadowed, denote incorrect predictions, while numbers highlighted in red signify predicted indexes of a new password.
  • Figure 4: The preprocessing operation of tokenizer of PagPassGPT. On the left side, it shows that during the training phase, the password pattern is preprocessed and outputs the concatenation of the password pattern and password with a format, named rule. On the right side, it shows that during the generation phase, the input of the password pattern is preprocessed into another short rule that is ready to be embedded.
  • Figure 5: The tokenization process of the tokenizer of PagPassGPT contains two functions: encode and decode. The encode takes a rule as input and produces tokenized indexes while decoding reverses the process.
  • ...and 6 more figures