Table of Contents
Fetching ...

How To Think About End-To-End Encryption and AI: Training, Processing, Disclosure, and Consent

Mallory Knodel, Andrés Fábrega, Daniella Ferrari, Jacob Leiken, Betty Li Hou, Derek Yen, Sam de Alfaro, Kyunghyun Cho, Sunoo Park

TL;DR

This paper interrogates whether end-to-end encryption (E2EE) can coexist with contemporary AI integrations in messaging platforms. It develops a framework that simultaneously analyzes cryptographic guarantees, practical deployment, and legal/regulatory considerations, concluding that training on E2EE data is incompatible with E2EE, while inference can be compatible only under strict, endpoint-local or tightly controlled setups. The authors catalog real-world deployments (Apple, Samsung, Meta) to illustrate current practices and gaps, and they propose four core recommendations: preserve E2EE in processing, avoid unqualified E2EE claims when third parties access data, require opt-in consent for AI features, and apply rigorous, transparent disclosures. The work emphasizes that unlocking responsible AI in E2EE contexts demands coordinated technical design and robust regulatory and consumer-protection frameworks to prevent systemic erosion of confidentiality and user trust.

Abstract

End-to-end encryption (E2EE) has become the gold standard for securing communications, bringing strong confidentiality and privacy guarantees to billions of users worldwide. However, the current push towards widespread integration of artificial intelligence (AI) models, including in E2EE systems, raises some serious security concerns. This work performs a critical examination of the (in)compatibility of AI models and E2EE applications. We explore this on two fronts: (1) the integration of AI "assistants" within E2EE applications, and (2) the use of E2EE data for training AI models. We analyze the potential security implications of each, and identify conflicts with the security guarantees of E2EE. Then, we analyze legal implications of integrating AI models in E2EE applications, given how AI integration can undermine the confidentiality that E2EE promises. Finally, we offer a list of detailed recommendations based on our technical and legal analyses, including: technical design choices that must be prioritized to uphold E2EE security; how service providers must accurately represent E2EE security; and best practices for the default behavior of AI features and for requesting user consent. We hope this paper catalyzes an informed conversation on the tensions that arise between the brisk deployment of AI and the security offered by E2EE, and guides the responsible development of new AI features.

How To Think About End-To-End Encryption and AI: Training, Processing, Disclosure, and Consent

TL;DR

This paper interrogates whether end-to-end encryption (E2EE) can coexist with contemporary AI integrations in messaging platforms. It develops a framework that simultaneously analyzes cryptographic guarantees, practical deployment, and legal/regulatory considerations, concluding that training on E2EE data is incompatible with E2EE, while inference can be compatible only under strict, endpoint-local or tightly controlled setups. The authors catalog real-world deployments (Apple, Samsung, Meta) to illustrate current practices and gaps, and they propose four core recommendations: preserve E2EE in processing, avoid unqualified E2EE claims when third parties access data, require opt-in consent for AI features, and apply rigorous, transparent disclosures. The work emphasizes that unlocking responsible AI in E2EE contexts demands coordinated technical design and robust regulatory and consumer-protection frameworks to prevent systemic erosion of confidentiality and user trust.

Abstract

End-to-end encryption (E2EE) has become the gold standard for securing communications, bringing strong confidentiality and privacy guarantees to billions of users worldwide. However, the current push towards widespread integration of artificial intelligence (AI) models, including in E2EE systems, raises some serious security concerns. This work performs a critical examination of the (in)compatibility of AI models and E2EE applications. We explore this on two fronts: (1) the integration of AI "assistants" within E2EE applications, and (2) the use of E2EE data for training AI models. We analyze the potential security implications of each, and identify conflicts with the security guarantees of E2EE. Then, we analyze legal implications of integrating AI models in E2EE applications, given how AI integration can undermine the confidentiality that E2EE promises. Finally, we offer a list of detailed recommendations based on our technical and legal analyses, including: technical design choices that must be prioritized to uphold E2EE security; how service providers must accurately represent E2EE security; and best practices for the default behavior of AI features and for requesting user consent. We hope this paper catalyzes an informed conversation on the tensions that arise between the brisk deployment of AI and the security offered by E2EE, and guides the responsible development of new AI features.
Paper Structure (122 sections, 10 figures, 2 tables)

This paper contains 122 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: A sender $S$ and receiver $R$ communicate using an end-to-end encrypted application hosted by a company (middle). Solid lines represent plaintexts, and dashed lines represent ciphertexts. $S$ and $R$ can read their messages on their devices; however, while a message is "in transit" between their devices, it is encrypted so that it is not readable to the intermediary platform $P$ handling it on its servers (or indeed to anyone but the intended recipient, whether a network eavesdropper, an employee at $P$, a hacker who compromises $P$, or a wrong recipient to whom the message was delivered by mistake).
  • Figure 1: Example analyses of different implementations of AI assistants under our framework. Green tags denote an implementation compatible with E2EE, yellow tags denote an implementation that is not compatible with E2EE but has additional protections, and red tags denote implementations that are most vulnerable.
  • Figure 2: Inference and training of AI assistants happen continuously as a feedback loop. As users continue to query the AI assistant, this generates data that is stored and can be used to further train and improve the model.
  • Figure 3: Generating an output from a user's input, differentiating between what is on-device and on cloud servers between a cloud-based and local architecture.
  • Figure 4: Apple's three-tiered approach
  • ...and 5 more figures