Table of Contents
Fetching ...

Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data

Dario Pasquini, Giuseppe Ateniese, Carmela Troncoso

TL;DR

The concept of "universal" password model is introduced—a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system based on the target system.

Abstract

We introduce the concept of "universal password model" -- a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system. To achieve this, the model does not need to access any plaintext passwords from the target credentials. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying password distribution. Specifically, the model uses deep learning to capture the correlation between the auxiliary data of a group of users (e.g., users of a web application) and their passwords. It then exploits those patterns to create a tailored password model for the target system at inference time. No further training steps, targeted data collection, or prior knowledge of the community's password distribution is required. Besides improving over current password strength estimation techniques and attacks, the model enables any end-user (e.g., system administrators) to autonomously generate tailored password models for their systems without the often unworkable requirements of collecting suitable training data and fitting the underlying machine learning model. Ultimately, our framework enables the democratization of well-calibrated password models to the community, addressing a major challenge in the deployment of password security solutions at scale.

Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data

TL;DR

The concept of "universal" password model is introduced—a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system based on the target system.

Abstract

We introduce the concept of "universal password model" -- a password model that, once pre-trained, can automatically adapt its guessing strategy based on the target system. To achieve this, the model does not need to access any plaintext passwords from the target credentials. Instead, it exploits users' auxiliary information, such as email addresses, as a proxy signal to predict the underlying password distribution. Specifically, the model uses deep learning to capture the correlation between the auxiliary data of a group of users (e.g., users of a web application) and their passwords. It then exploits those patterns to create a tailored password model for the target system at inference time. No further training steps, targeted data collection, or prior knowledge of the community's password distribution is required. Besides improving over current password strength estimation techniques and attacks, the model enables any end-user (e.g., system administrators) to autonomously generate tailored password models for their systems without the often unworkable requirements of collecting suitable training data and fitting the underlying machine learning model. Ultimately, our framework enables the democratization of well-calibrated password models to the community, addressing a major challenge in the deployment of password security solutions at scale.
Paper Structure (60 sections, 6 equations, 17 figures, 2 tables, 1 algorithm)

This paper contains 60 sections, 6 equations, 17 figures, 2 tables, 1 algorithm.

Figures (17)

  • Figure 1: Depiction of a partial execution of a simplified attention-mechanism for a single query vector $q_i$ and the set of values: $\{v_1, v_2, v_3\}$.
  • Figure 2: Results of two guessing attacks for three password models melichera trained to model different language-specific password distributions.
  • Figure 3: Depiction of a UNCM and its internal working at deployment time.
  • Figure 4: Graphical representation of the configuration encoder. Panel (a): Sub-encoder. The sub-encoder is replicated for each available input. Panel (b): Oversimplified version of the mixing-encoder. Panel (c): The resulting configuration seed.
  • Figure 5: Simplified representation of the seeded password model$f_{\Theta | \psi}$ running on the partial input "password". We do not depict the final dense prediction layers.
  • ...and 12 more figures