Table of Contents
Fetching ...

Fair-Count-Min: Frequency Estimation under Equal Group-wise Approximation Factor

Nima Shahbazi, Stavros Sintos, Abolfazl Asudeh

TL;DR

This paper tackles fairness in streaming frequency estimation by addressing the additive bias of Count-Min sketches, which disproportionately harms low-frequency elements. It introduces Fair-Count-Min (FCM), a group-fair sketch that guarantees equal expected multiplicative approximation factors across predefined element groups via a group-aware semi-uniform hashing and column-partitioning scheme. The authors provide rigorous fairness proofs, analyze the price of fairness (PoF), and develop exact and practical algorithms to compute optimal bucket allocations per group, showing that fairness incurs negligible (often negative for d=1) additive error and retains CM’s space and time efficiency. Empirical evaluation on real and synthetic datasets confirms that FCM achieves group fairness across diverse settings with minimal overhead, offering a practical, theoretically-grounded solution for fair frequency estimation in streaming contexts.

Abstract

Frequency estimation in streaming data often relies on sketches like Count-Min (CM) to provide approximate answers with sublinear space. However, CM sketches introduce additive errors that disproportionately impact low-frequency elements, creating fairness concerns across different groups of elements. We introduce Fair-Count-Min, a frequency estimation sketch that guarantees equal expected approximation factors across element groups, thus addressing the unfairness issue. We propose a column partitioning approach with group-aware semi-uniform hashing to eliminate collisions between elements from different groups. We provide theoretical guarantees for fairness, analyze the price of fairness, and validate our theoretical findings through extensive experiments on real-world and synthetic datasets. Our experimental results show that Fair-Count-Min achieves fairness with minimal additional error and maintains competitive efficiency compared to standard CM sketches.

Fair-Count-Min: Frequency Estimation under Equal Group-wise Approximation Factor

TL;DR

This paper tackles fairness in streaming frequency estimation by addressing the additive bias of Count-Min sketches, which disproportionately harms low-frequency elements. It introduces Fair-Count-Min (FCM), a group-fair sketch that guarantees equal expected multiplicative approximation factors across predefined element groups via a group-aware semi-uniform hashing and column-partitioning scheme. The authors provide rigorous fairness proofs, analyze the price of fairness (PoF), and develop exact and practical algorithms to compute optimal bucket allocations per group, showing that fairness incurs negligible (often negative for d=1) additive error and retains CM’s space and time efficiency. Empirical evaluation on real and synthetic datasets confirms that FCM achieves group fairness across diverse settings with minimal overhead, offering a practical, theoretically-grounded solution for fair frequency estimation in streaming contexts.

Abstract

Frequency estimation in streaming data often relies on sketches like Count-Min (CM) to provide approximate answers with sublinear space. However, CM sketches introduce additive errors that disproportionately impact low-frequency elements, creating fairness concerns across different groups of elements. We introduce Fair-Count-Min, a frequency estimation sketch that guarantees equal expected approximation factors across element groups, thus addressing the unfairness issue. We propose a column partitioning approach with group-aware semi-uniform hashing to eliminate collisions between elements from different groups. We provide theoretical guarantees for fairness, analyze the price of fairness, and validate our theoretical findings through extensive experiments on real-world and synthetic datasets. Our experimental results show that Fair-Count-Min achieves fairness with minimal additional error and maintains competitive efficiency compared to standard CM sketches.

Paper Structure

This paper contains 36 sections, 1 theorem, 45 equations, 79 figures, 1 table.

Key Result

Theorem 1

A Count-Min sketch with a group-aware semi-uniform hash function $\mathsf{h}(.)$ is group-fair, if the number of bins $l$ allocated to each group is proportional to the ratio of element types from that group. That is, $w_l = \frac{n_l}{n} w, \forall \mathbf{g}_l\in \mathcal{G}$.

Figures (79)

  • Figure 1: Illustration of a column-partitioning based group-fair min-count with one row, i.e., one hash function $\mathsf{h}(.)$.
  • Figure 2: Illustration of a column-partitioning based group-fair min-count with $d$ rows.
  • Figure 3: Illustration of the row-partitioning baseline.
  • Figure 4: effect of varying disadvantaged group size $n_l$ on unfairness, google n-grams, $w=65536, d=5$.
  • Figure 5: effect of varying disadvantaged group size $n_l$ on unfairness, synthetic, $n=20K,w=512,d=10$.
  • ...and 74 more figures

Theorems & Definitions (4)

  • Definition 1: Approximation Factor
  • Definition 2: Group-Fair Count-Min
  • Definition 3
  • Theorem 1