Table of Contents
Fetching ...

Can we spot a fake?

Shahar Mendelson, Grigoris Paouris, Roman Vershynin

Abstract

The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point $X$ from the standard normal distribution in $\mathbb{R}^n$. An adversary observes $X$ and corrupts it by adding a vector $rt$, where they can choose any vector $t$ from a fixed set $T$ of the adversary's "tricks", and where $r>0$ is a fixed radius. The adversary's choice of $t=t(X)$ may depend on the true data $X$. The adversary wants to hide the corruption by making the fake data $X+rt$ statistically indistinguishable from the real data $X$. What is the largest radius $r=r(T)$ for which the adversary can create an undetectable fake? We show that for highly symmetric sets $T$, the detectability radius $r(T)$ is approximately twice the scaled Gaussian width of $T$. The upper bound actually holds for arbitrary sets $T$ and generalizes to arbitrary, non-Gaussian distributions of real data $X$. The lower bound may fail for not highly symmetric $T$, but we conjecture that this problem can be solved by considering the focused version of the Gaussian width of $T$, which focuses on the most important directions of $T$.

Can we spot a fake?

Abstract

The problem of detecting fake data inspires the following seemingly simple mathematical question. Sample a data point from the standard normal distribution in . An adversary observes and corrupts it by adding a vector , where they can choose any vector from a fixed set of the adversary's "tricks", and where is a fixed radius. The adversary's choice of may depend on the true data . The adversary wants to hide the corruption by making the fake data statistically indistinguishable from the real data . What is the largest radius for which the adversary can create an undetectable fake? We show that for highly symmetric sets , the detectability radius is approximately twice the scaled Gaussian width of . The upper bound actually holds for arbitrary sets and generalizes to arbitrary, non-Gaussian distributions of real data . The lower bound may fail for not highly symmetric , but we conjecture that this problem can be solved by considering the focused version of the Gaussian width of , which focuses on the most important directions of .

Paper Structure

This paper contains 20 sections, 8 theorems, 75 equations.

Key Result

Theorem 1.2

For any set $T \subset \mathbb{R}^n$, we have Moreover, for any highly symmetric set $T$ we have

Theorems & Definitions (17)

  • Theorem 1.2: Detectability radius, informal
  • Theorem 2.1: When the fake is detectable
  • proof
  • Remark 2.2: The error term is small
  • Definition 3.1: Highly symmetric set
  • Theorem 3.2: Where the fake is undetectable
  • proof : Proof of Theorem \ref{['thm: lower']}
  • Lemma 3.3: Sign flipping
  • proof
  • Theorem 4.1: Where the fake is detectable: general distributions
  • ...and 7 more