Table of Contents
Fetching ...

A Fair and Memory/Time-efficient Hashmap

Abolfazl Asudeh, Nima Shahbazi, Stavros Sintos

TL;DR

FairHash introduces group fairness into hashmap design, defining single and pairwise fairness notions and presenting three algorithmic families (ranking-based, cut-based, discrepancy-based) to achieve fair, memory-aware hashing. Ranking-based methods preserve similar time/memory to CDF-based hashing while reducing unfairness; cut-based approaches guarantee 0-unfairness with additional memory, and discrepancy-based methods offer a tunable trade-off between fairness and memory. The framework is complemented by necklace-splitting-based reductions for low-memory 0-unfair hashing and by dynamic programming and local-search techniques for practical fairness improvements. Experimental results on real-world datasets show substantial unfairness reductions with minimal performance overhead compared to baselines. Overall, FairHash provides a principled, scalable toolkit for equitable data distribution in hash-based structures with concrete theoretical guarantees and empirical validation.

Abstract

Hashmap is a fundamental data structure in computer science. There has been extensive research on constructing hashmaps that minimize the number of collisions leading to efficient lookup query time. Recently, the data-dependant approaches, construct hashmaps tailored for a target data distribution that guarantee to uniformly distribute data across different buckets and hence minimize the collisions. Still, to the best of our knowledge, none of the existing technique guarantees group fairness among different groups of items stored in the hashmap. Therefore, in this paper, we introduce FairHash, a data-dependant hashmap that guarantees uniform distribution at the group-level across hash buckets, and hence, satisfies the statistical parity notion of group fairness. We formally define, three notions of fairness and, unlike existing work, FairHash satisfies all three of them simultaneously. We propose three families of algorithms to design fair hashmaps, suitable for different settings. Our ranking-based algorithms reduce the unfairness of data-dependant hashmaps without any memory-overhead. The cut-based algorithms guarantee zero-unfairness in all cases, irrespective of how the data is distributed, but those introduce an extra memory-overhead. Last but not least, the discrepancy-based algorithms enable trading off between various fairness notions. In addition to the theoretical analysis, we perform extensive experiments to evaluate the efficiency and efficacy of our algorithms on real datasets. Our results verify the superiority of FairHash compared to the other baselines on fairness at almost no performance cost.

A Fair and Memory/Time-efficient Hashmap

TL;DR

FairHash introduces group fairness into hashmap design, defining single and pairwise fairness notions and presenting three algorithmic families (ranking-based, cut-based, discrepancy-based) to achieve fair, memory-aware hashing. Ranking-based methods preserve similar time/memory to CDF-based hashing while reducing unfairness; cut-based approaches guarantee 0-unfairness with additional memory, and discrepancy-based methods offer a tunable trade-off between fairness and memory. The framework is complemented by necklace-splitting-based reductions for low-memory 0-unfair hashing and by dynamic programming and local-search techniques for practical fairness improvements. Experimental results on real-world datasets show substantial unfairness reductions with minimal performance overhead compared to baselines. Overall, FairHash provides a principled, scalable toolkit for equitable data distribution in hash-based structures with concrete theoretical guarantees and empirical validation.

Abstract

Hashmap is a fundamental data structure in computer science. There has been extensive research on constructing hashmaps that minimize the number of collisions leading to efficient lookup query time. Recently, the data-dependant approaches, construct hashmaps tailored for a target data distribution that guarantee to uniformly distribute data across different buckets and hence minimize the collisions. Still, to the best of our knowledge, none of the existing technique guarantees group fairness among different groups of items stored in the hashmap. Therefore, in this paper, we introduce FairHash, a data-dependant hashmap that guarantees uniform distribution at the group-level across hash buckets, and hence, satisfies the statistical parity notion of group fairness. We formally define, three notions of fairness and, unlike existing work, FairHash satisfies all three of them simultaneously. We propose three families of algorithms to design fair hashmaps, suitable for different settings. Our ranking-based algorithms reduce the unfairness of data-dependant hashmaps without any memory-overhead. The cut-based algorithms guarantee zero-unfairness in all cases, irrespective of how the data is distributed, but those introduce an extra memory-overhead. Last but not least, the discrepancy-based algorithms enable trading off between various fairness notions. In addition to the theoretical analysis, we perform extensive experiments to evaluate the efficiency and efficacy of our algorithms on real datasets. Our results verify the superiority of FairHash compared to the other baselines on fairness at almost no performance cost.
Paper Structure (42 sections, 13 theorems, 13 equations, 65 figures, 4 tables, 3 algorithms)

This paper contains 42 sections, 13 theorems, 13 equations, 65 figures, 4 tables, 3 algorithms.

Key Result

Proposition 1

Collision probability is satisfied if and only if all buckets contain exactly the same number of points i.e., for each bucket $b_j$, $|b_j|=\frac{n}{m}$.

Figures (65)

  • Figure 1: Distribution of 100 random integers in [0,9].
  • Figure 2: Distribution of 100 points belonging to two groups blue and red in 10 buckets.
  • Figure 3: Effect of varying dataset size $n$ on unfairness, Adult, sex
  • Figure 4: Effect of varying dataset size $n$ on unfairness, Compas, sex
  • Figure 5: Effect of varying minority-to-majority ratio on unfairness, Adult, sex
  • ...and 60 more figures

Theorems & Definitions (14)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Definition 1: $(\varepsilon,\alpha)$-hashmap
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Lemma 2
  • Theorem 3
  • Lemma 3
  • ...and 4 more