Table of Contents
Fetching ...

Memorization in deep learning: A survey

Jiaheng Wei, Yanjun Zhang, Leo Yu Zhang, Ming Ding, Chao Chen, Kok-Leong Ong, Jun Zhang, Yang Xiang

TL;DR

This survey systematically analyzes memorization in deep learning, distinguishing memorization learning from pattern learning and framing it within generalization and security/privacy contexts. It presents a taxonomy of memorization definitions, along with comprehensive evaluation methods at both example and model levels, and surveys memorization dynamics during training, including data, architecture, and optimization factors. It then surveys the security/privacy and forgetting literature, linking memorization to privacy risks such as membership inference and data extraction, as well as to defenses like differential privacy and data augmentation. The paper also explores practical applications of memorization and forgetting, including noisy-label learning, example enhancement, privacy auditing, and model editing, and discusses open challenges and directions for future research. Overall, the work provides a structured, multi-level understanding of memorization in DNNs and highlights its implications for generalization, security, privacy, and trustworthy AI.

Abstract

Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model generalization, security, and privacy. This raises critical questions about the nature of generalization in DNNs and their susceptibility to security breaches. In this survey, we present a systematic framework to organize memorization definitions based on the generalization and security/privacy domains and summarize memorization evaluation methods at both the example and model levels. Through a comprehensive literature review, we explore DNN memorization behaviors and their impacts on security and privacy. We also introduce privacy vulnerabilities caused by memorization and the phenomenon of forgetting and explore its connection with memorization. Furthermore, we spotlight various applications leveraging memorization and forgetting mechanisms, including noisy label learning, privacy preservation, and model enhancement. This survey offers the first-in-kind understanding of memorization in DNNs, providing insights into its challenges and opportunities for enhancing AI development while addressing critical ethical concerns.

Memorization in deep learning: A survey

TL;DR

This survey systematically analyzes memorization in deep learning, distinguishing memorization learning from pattern learning and framing it within generalization and security/privacy contexts. It presents a taxonomy of memorization definitions, along with comprehensive evaluation methods at both example and model levels, and surveys memorization dynamics during training, including data, architecture, and optimization factors. It then surveys the security/privacy and forgetting literature, linking memorization to privacy risks such as membership inference and data extraction, as well as to defenses like differential privacy and data augmentation. The paper also explores practical applications of memorization and forgetting, including noisy-label learning, example enhancement, privacy auditing, and model editing, and discusses open challenges and directions for future research. Overall, the work provides a structured, multi-level understanding of memorization in DNNs and highlights its implications for generalization, security, privacy, and trustworthy AI.

Abstract

Deep Learning (DL) powered by Deep Neural Networks (DNNs) has revolutionized various domains, yet understanding the intricacies of DNN decision-making and learning processes remains a significant challenge. Recent investigations have uncovered an interesting memorization phenomenon in which DNNs tend to memorize specific details from examples rather than learning general patterns, affecting model generalization, security, and privacy. This raises critical questions about the nature of generalization in DNNs and their susceptibility to security breaches. In this survey, we present a systematic framework to organize memorization definitions based on the generalization and security/privacy domains and summarize memorization evaluation methods at both the example and model levels. Through a comprehensive literature review, we explore DNN memorization behaviors and their impacts on security and privacy. We also introduce privacy vulnerabilities caused by memorization and the phenomenon of forgetting and explore its connection with memorization. Furthermore, we spotlight various applications leveraging memorization and forgetting mechanisms, including noisy label learning, privacy preservation, and model enhancement. This survey offers the first-in-kind understanding of memorization in DNNs, providing insights into its challenges and opportunities for enhancing AI development while addressing critical ethical concerns.
Paper Structure (68 sections, 2 equations, 6 figures, 5 tables)

This paper contains 68 sections, 2 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: The Direct Memorization Effect. In (a), we use an image generator to describe memorization. The upper part demonstrates the memorization effect and the lower part represents the common generation. For (b), the memorization effect has two different levels: Example Memorization and Model Memorization.
  • Figure 2: Paper Structure.
  • Figure 3: Memorization Definitions and Evaluations.
  • Figure 4: Demonstration of the Long-tailed Examples.
  • Figure 5: Underlying Risks of Memorization.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Definition 9
  • Definition 10
  • ...and 1 more