Table of Contents
Fetching ...

MAC Address Anonymization for Crowd Counting

Jean-François Determe, Sophia Azzagnuni, François Horlin, Philippe De Doncker

TL;DR

The paper tackles privacy-preserving crowd counting using WiFi probe requests by hashing MAC addresses after prepending time-varying peppers, producing 64-bit SA identifiers that stay unlinkable across time. It presents a rigorous collision-rate analysis, deriving exact and approximate formulas with analytical error bounds and shows that with $m=2^{64}$ and up to $10^7$ MAC addresses, the expected collision rate is about $10^{-12.5}$, well below the $10^{-9}$ target, while time synchronization errors of around $10$ ms have negligible impact on counts. The contributions include a novel two-part pepper scheme, formal privacy properties (intractability and non-tracking), and a tractable mathematical framework for collision analysis, with discussion on practical validation and limitations. The outcomes enable scalable, privacy-preserving crowd counting in large events and offer techniques that could generalize to other domains requiring hashed identifiers with time-varying salts.

Abstract

Research has shown that counting WiFi packets called probe requests (PRs) implicitly provides a proxy for the number of people in an area. In this paper, we discuss a crowd counting system involving WiFi sensors detecting PRs over the air, then extracting and anonymizing their media access control (MAC) addresses using a hash-based approach. This paper discusses an anonymization procedure and shows time-synchronization inaccuracies among sensors and hashing collision rates to be low enough to prevent anonymization from interfering with counting algorithms. In particular, we derive an approximation of the collision rate of uniformly distributed identifiers, with analytical error bounds.

MAC Address Anonymization for Crowd Counting

TL;DR

The paper tackles privacy-preserving crowd counting using WiFi probe requests by hashing MAC addresses after prepending time-varying peppers, producing 64-bit SA identifiers that stay unlinkable across time. It presents a rigorous collision-rate analysis, deriving exact and approximate formulas with analytical error bounds and shows that with and up to MAC addresses, the expected collision rate is about , well below the target, while time synchronization errors of around ms have negligible impact on counts. The contributions include a novel two-part pepper scheme, formal privacy properties (intractability and non-tracking), and a tractable mathematical framework for collision analysis, with discussion on practical validation and limitations. The outcomes enable scalable, privacy-preserving crowd counting in large events and offer techniques that could generalize to other domains requiring hashed identifiers with time-varying salts.

Abstract

Research has shown that counting WiFi packets called probe requests (PRs) implicitly provides a proxy for the number of people in an area. In this paper, we discuss a crowd counting system involving WiFi sensors detecting PRs over the air, then extracting and anonymizing their media access control (MAC) addresses using a hash-based approach. This paper discusses an anonymization procedure and shows time-synchronization inaccuracies among sensors and hashing collision rates to be low enough to prevent anonymization from interfering with counting algorithms. In particular, we derive an approximation of the collision rate of uniformly distributed identifiers, with analytical error bounds.

Paper Structure

This paper contains 21 sections, 4 theorems, 31 equations, 4 figures.

Key Result

Theorem 1

For $n$ inserts into $m$ buckets, the collision rate, $\mathbb{E} \lbrack Y^{(n,m)}\rbrack/n$, is where the uniform distribution assumption has been used.

Figures (4)

  • Figure 1: Scheme of the PR sensing procedure. Three WiFi sensors with overlapping ranges detect WiFi probe requests emitted by the smartphones of individuals. The shaded ellipses and the associated cones depict sensor detection ranges. Each sensor uses HTTPS links to periodically retrieve server peppers from the central server and uses another HTTPS link to upload anonymized PRs. Time synchronization is achieved by calibration with NTP servers. Communication links are depicted for only one sensor, to avoid clutter.
  • Figure 2: (From determe2022monitoring) Scheme of the anonymization procedure executed by sensors.
  • Figure 3: Numerically computed value of $\log_{10} (\mathbb{E} \left\lbrack Y^{(n,m)} \right\rbrack / n)$ (using (\ref{['eq:exactCollRate']})) in Matlab R2019a as a function of the number of inserts $n$ and the number of buckets $m$. With $\log_{10}(n) \geq 3$, numerical instabilities appear for values of $\log_{10}(m)$ as low as $9$.
  • Figure 4: Levels sets of the approximation (\ref{['eq:2ndapprox']}) of the collision rate as a function of the number of inserts $n$ and the number of buckets $m$.

Theorems & Definitions (8)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Lemma A1
  • proof
  • Lemma A2
  • proof