Table of Contents
Fetching ...

Password Strength Detection via Machine Learning: Analysis, Modeling, and Evaluation

Jiazhi Mo, Hailu Kuang, Xiaoqi Li

TL;DR

This work investigates password strength detection using machine learning on public password repositories. It develops a structured pipeline with feature engineering (length, digit/letter counts, and symbol usage), labels passwords as strong or weak, and trains six classifiers, with hyperparameter tuning and validation. The results show that Decision Trees and Stacking ensembles deliver the highest accuracy and recall, making them practical for password strength assessment and defense guidance. The study also provides a comprehensive data-collection and preprocessing framework, along with defense recommendations such as MFA, audits, and training, to improve password security in real-world systems. Overall, the approach yields a data-driven, scalable tool for strengthening user-chosen passwords and guiding secure policy design.

Abstract

As network security issues continue gaining prominence, password security has become crucial in safeguarding personal information and network systems. This study first introduces various methods for system password cracking, outlines password defense strategies, and discusses the application of machine learning in the realm of password security. Subsequently, we conduct a detailed public password database analysis, uncovering standard features and patterns among passwords. We extract multiple characteristics of passwords, including length, the number of digits, the number of uppercase and lowercase letters, and the number of special characters. We then experiment with six different machine learning algorithms: support vector machines, logistic regression, neural networks, decision trees, random forests, and stacked models, evaluating each model's performance based on various metrics, including accuracy, recall, and F1 score through model validation and hyperparameter tuning. The evaluation results on the test set indicate that decision trees and stacked models excel in accuracy, recall, and F1 score, making them a practical option for the strong and weak password classification task.

Password Strength Detection via Machine Learning: Analysis, Modeling, and Evaluation

TL;DR

This work investigates password strength detection using machine learning on public password repositories. It develops a structured pipeline with feature engineering (length, digit/letter counts, and symbol usage), labels passwords as strong or weak, and trains six classifiers, with hyperparameter tuning and validation. The results show that Decision Trees and Stacking ensembles deliver the highest accuracy and recall, making them practical for password strength assessment and defense guidance. The study also provides a comprehensive data-collection and preprocessing framework, along with defense recommendations such as MFA, audits, and training, to improve password security in real-world systems. Overall, the approach yields a data-driven, scalable tool for strengthening user-chosen passwords and guiding secure policy design.

Abstract

As network security issues continue gaining prominence, password security has become crucial in safeguarding personal information and network systems. This study first introduces various methods for system password cracking, outlines password defense strategies, and discusses the application of machine learning in the realm of password security. Subsequently, we conduct a detailed public password database analysis, uncovering standard features and patterns among passwords. We extract multiple characteristics of passwords, including length, the number of digits, the number of uppercase and lowercase letters, and the number of special characters. We then experiment with six different machine learning algorithms: support vector machines, logistic regression, neural networks, decision trees, random forests, and stacked models, evaluating each model's performance based on various metrics, including accuracy, recall, and F1 score through model validation and hyperparameter tuning. The evaluation results on the test set indicate that decision trees and stacked models excel in accuracy, recall, and F1 score, making them a practical option for the strong and weak password classification task.

Paper Structure

This paper contains 29 sections, 1 equation, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Distribution of Password Lengths for Six Typical Datasets.
  • Figure 2: Distribution of Password Character Composition for Six Typical Datasets.