Table of Contents
Fetching ...

Chunking Attacks on File Backup Services using Content-Defined Chunking

Boris Alexeev, Colin Percival, Yan X Zhang

TL;DR

This work investigates the security of content-defined chunking (CDC) in file backup services, focusing on per-user secret parameters used in rolling-hash CDC schemes. It introduces a two-part attack framework—parameter-extraction attacks and post-parameterization attacks—and demonstrates concrete attacks on Tarsnap, Borg, and Restic, including protocol-agnostic leakage pathways. The authors provide practical attack constructions, complexity analyses, and defenses (e.g., enlarging keyspace, employing multiple independent hashes) while highlighting how compression interacts with leakage. They show that chunking outputs and their compression can reveal nontrivial information about user data, with real-world implications for privacy in cloud backups. The work emphasizes the need for stronger CDC designs that resist both parameter leakage and post-parameter leakage to preserve data confidentiality in deduplicating backup services.

Abstract

Systems such as file backup services often use content-defined chunking (CDC) algorithms, especially those based on rolling hash techniques, to split files into chunks in a way that allows for data deduplication. These chunking algorithms often depend on per-user parameters in an attempt to avoid leaking information about the data being stored. We present attacks to extract these chunking parameters and discuss protocol-agnostic attacks and loss of security once the parameters are breached (including when these parameters are not setup at all, which is often available as an option). Our parameter-extraction attacks themselves are protocol-specific but their ideas are generalizable to many potential CDC schemes.

Chunking Attacks on File Backup Services using Content-Defined Chunking

TL;DR

This work investigates the security of content-defined chunking (CDC) in file backup services, focusing on per-user secret parameters used in rolling-hash CDC schemes. It introduces a two-part attack framework—parameter-extraction attacks and post-parameterization attacks—and demonstrates concrete attacks on Tarsnap, Borg, and Restic, including protocol-agnostic leakage pathways. The authors provide practical attack constructions, complexity analyses, and defenses (e.g., enlarging keyspace, employing multiple independent hashes) while highlighting how compression interacts with leakage. They show that chunking outputs and their compression can reveal nontrivial information about user data, with real-world implications for privacy in cloud backups. The work emphasizes the need for stronger CDC designs that resist both parameter leakage and post-parameter leakage to preserve data confidentiality in deduplicating backup services.

Abstract

Systems such as file backup services often use content-defined chunking (CDC) algorithms, especially those based on rolling hash techniques, to split files into chunks in a way that allows for data deduplication. These chunking algorithms often depend on per-user parameters in an attempt to avoid leaking information about the data being stored. We present attacks to extract these chunking parameters and discuss protocol-agnostic attacks and loss of security once the parameters are breached (including when these parameters are not setup at all, which is often available as an option). Our parameter-extraction attacks themselves are protocol-specific but their ideas are generalizable to many potential CDC schemes.

Paper Structure

This paper contains 24 sections, 12 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: A visualization of data flow from the client to the server.
  • Figure 2: The geometric distribution drawn in red is the distribution of uncompressed chunk sizes for most chunking algorithms, such as in Borg or Restic. The unimodal distribution drawn in green is the distribution of uncompressed chunk sizes in Tarsnap.

Theorems & Definitions (4)

  • Example 4.1: Single Known File
  • Example 4.2: Music
  • Example 4.3: DNA
  • Example 4.4: Citizenship / binary choices