Table of Contents
Fetching ...

SAT-Based Search for Minwise Independent Families

Enrico Iurlano, Günther R. Raidl

TL;DR

This work tackles the problem of constructing the smallest possible $k$-restricted minwise independent families within $S_n$ to enable efficient MinHash-style similarity estimation. It develops a pure SAT model to represent permutations via incidence matrices, enforces permutation and minwise-independence constraints, and implements symmetry breaking while leveraging a Mathon–van Trung-inspired left-coset decomposition to reduce search space. Computational experiments indicate that many optimal constructions arise as unions of left cosets of subgroups, with the left-coset approach generally outperforming the right-coset variant in practice. A key theoretical contribution is a bijection between derangements with $k$ fixed points and waste-index permutations, providing a succinct alternative route to Bargachev’s lower bounds on the size of such families. The framework thus enables provable-optimal or near-optimal constructions at modest scales and suggests directions for scaling via ILP and incremental SAT with clause learning.

Abstract

Proposed for rapid document similarity estimation in web search engines, the celebrated property of minwise independence imposes highly symmetric constraints on a family $\mathcal{F}$ of permutations of $\{1,\ldots, n\}$: The property is fulfilled by $\mathcal{F}$ if for each $j\in \{1,\ldots,n\}$, any cardinality-$j$ subset $X\subseteq \{1,\ldots,n\}$, and any fixed element $x^\ast\in X$, it occurs with probability $1/j$ that a randomly drawn permutation $π$ from $\mathcal{F}$ satisfies $π(x^\ast)=\min \{π(x) : x\in X\}$. The central interest is to find a family with fewest possible members meeting the stated constraints. We provide a framework that, firstly, is realized as a pure SAT model and, secondly, generalizes a heuristic of Mathon and van Trung to the search of these families. Originally, the latter enforces an underlying group-theoretic decomposition to achieve a significant speed-up for the computer-aided search of structures which can be identified with so-called rankwise independent families. We observe that this approach is suitable to find provenly optimal new representatives of minwise independent families while yielding a decisive speed-up, too. As the problem has a naive search space of size at least $(n!)^n$, we also carefully address symmetry breaking. Finally, we add a bijective proof for a problem encountered by Bargachev when deriving a lower bound on the number of members in a minimal rankwise independent family.

SAT-Based Search for Minwise Independent Families

TL;DR

This work tackles the problem of constructing the smallest possible -restricted minwise independent families within to enable efficient MinHash-style similarity estimation. It develops a pure SAT model to represent permutations via incidence matrices, enforces permutation and minwise-independence constraints, and implements symmetry breaking while leveraging a Mathon–van Trung-inspired left-coset decomposition to reduce search space. Computational experiments indicate that many optimal constructions arise as unions of left cosets of subgroups, with the left-coset approach generally outperforming the right-coset variant in practice. A key theoretical contribution is a bijection between derangements with fixed points and waste-index permutations, providing a succinct alternative route to Bargachev’s lower bounds on the size of such families. The framework thus enables provable-optimal or near-optimal constructions at modest scales and suggests directions for scaling via ILP and incremental SAT with clause learning.

Abstract

Proposed for rapid document similarity estimation in web search engines, the celebrated property of minwise independence imposes highly symmetric constraints on a family of permutations of : The property is fulfilled by if for each , any cardinality- subset , and any fixed element , it occurs with probability that a randomly drawn permutation from satisfies . The central interest is to find a family with fewest possible members meeting the stated constraints. We provide a framework that, firstly, is realized as a pure SAT model and, secondly, generalizes a heuristic of Mathon and van Trung to the search of these families. Originally, the latter enforces an underlying group-theoretic decomposition to achieve a significant speed-up for the computer-aided search of structures which can be identified with so-called rankwise independent families. We observe that this approach is suitable to find provenly optimal new representatives of minwise independent families while yielding a decisive speed-up, too. As the problem has a naive search space of size at least , we also carefully address symmetry breaking. Finally, we add a bijective proof for a problem encountered by Bargachev when deriving a lower bound on the number of members in a minimal rankwise independent family.

Paper Structure

This paper contains 5 sections, 6 theorems, 28 equations, 1 table.

Key Result

Theorem 4

Let $\mathcal{F}\subseteq S_n$ be $k$-restricted minwise independent and $\mathcal{G}\subseteq S_n$ be $k$-rankwise independent. Then, the following estimates apply:

Theorems & Definitions (19)

  • Example 1
  • Remark 2
  • Remark 3
  • Theorem 4
  • Remark 5
  • Theorem 6: bargachev2006some
  • Remark 7
  • Remark 8
  • Example 9
  • Proposition 10
  • ...and 9 more