Table of Contents
Fetching ...

Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation

Damien Berriaud, Roger Wattenhofer

TL;DR

A theoretical framework for dealing with clone-proof weighting functions, which distribute importance across elements of a set such that similar objects share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity.

Abstract

We are given a set of elements in a metric space. The distribution of the elements is arbitrary, possibly adversarial. Can we weigh the elements in a way that is resistant to such (adversarial) manipulations? This problem arises in various contexts. For instance, the elements could represent data points, requiring robust domain adaptation. Alternatively, they might represent tasks to be aggregated into a benchmark; or questions about personal political opinions in voting advice applications. This article introduces a theoretical framework for dealing with such problems. We propose clone-proof weighting functions as a solution concept. These functions distribute importance across elements of a set such that similar objects (``clones'') share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity. Our framework extends the maximum uncertainty principle to accommodate general metric spaces and includes a set of axioms -- symmetry, continuity, and clone-proofness -- that guide the construction of weighting functions. Finally, we address the existence of weighting functions satisfying our axioms in the significant case of Euclidean spaces and propose a general method for their construction.

Clone-Robust Weights in Metric Spaces: Handling Redundancy Bias for Benchmark Aggregation

TL;DR

A theoretical framework for dealing with clone-proof weighting functions, which distribute importance across elements of a set such that similar objects share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity.

Abstract

We are given a set of elements in a metric space. The distribution of the elements is arbitrary, possibly adversarial. Can we weigh the elements in a way that is resistant to such (adversarial) manipulations? This problem arises in various contexts. For instance, the elements could represent data points, requiring robust domain adaptation. Alternatively, they might represent tasks to be aggregated into a benchmark; or questions about personal political opinions in voting advice applications. This article introduces a theoretical framework for dealing with such problems. We propose clone-proof weighting functions as a solution concept. These functions distribute importance across elements of a set such that similar objects (``clones'') share (some of) their weights, thus avoiding a potential bias introduced by their multiplicity. Our framework extends the maximum uncertainty principle to accommodate general metric spaces and includes a set of axioms -- symmetry, continuity, and clone-proofness -- that guide the construction of weighting functions. Finally, we address the existence of weighting functions satisfying our axioms in the significant case of Euclidean spaces and propose a general method for their construction.

Paper Structure

This paper contains 27 sections, 19 theorems, 34 equations, 8 figures, 4 algorithms.

Key Result

Theorem 1

For $r>0$, the weighting function $g_r$ is well-defined and belongs in $\mathcal{R}_{2r}(\mathbb{R}^n, d_2).$

Figures (8)

  • Figure 1: Which weight should we give to each individual point? By symmetry, one would expect the areas in blue, red and green to sum up to the same value, even though they contain different numbers of points. How to deal with the addition of cyan though?
  • Figure 2: Visualization of the neighborhoods of $d_\Pi$, and of the divergence of individual weightings under Axioms \ref{['axi:sym']} and \ref{['axi:class_cont']}. The edges in \ref{['fig:visualization_power_two']} highlight the symmetries of $S_{\alpha, \alpha, \gamma}$ in the limit $\gamma \to 0$; the equilateral triangle in \ref{['fig:visualization_equilateral']} displays the symmetry of $S_{\alpha, \alpha/2, \sqrt{3} \alpha/ 2 }.$
  • Figure 3: Computation of $g_r(S)(x)$ in the two-dimensional Euclidean space $(\mathbb{R}^2, d_2)$, where the set $S = \{w,x,y,z\}$ contains four elements. A cell $A_r(U)$ is uniquely defined by the subset $U\subseteq S$ as the possibly empty intersection of the balls around each element in $U$ and the complement of the balls of each element of $S$ absent from $U$ (c.f. Appendix \ref{['sec:proof']}). For each subset $U$ containing $x$, the grading function $g_{r,S,x}$ is constant on the cell $A_r(U)$ and equal to the inverse depth of the cell, i.e., $g_{r,S,x}(z) = 1 / U$ for all $z$ in $A_r(U).$ The weight of $x$ in $S$ is then equal to the weighted average of $g_{r,S,x}$ on the ball centered in $x$, where the weight of each cell corresponds to its area normalized by the total area of the balls' union. We estimated the value $g_r(S)(x) \simeq 0.19$ via Monte Carlo sampling, c.f. Algorithm \ref{['alg:gr_estimate']}.
  • Figure 4: Key steps in demonstrating that $g_r$ satisfies Axioms \ref{['axi:clone_fair_uni']}, \ref{['axi:indiv_cont']} and \ref{['axi:alpha_clone_locality']}.
  • Figure 5: The weighting function $g_r$ does not satisfy Axiom \ref{['axi:sym']} in $(\mathbb{R}^2, d_1).$ As illustrated by the dashed $L^1$ ball centered in $x$, points $y$ and $z$ are indeed at the same distance of $x$, thus belong in a common isometry class in $S=\{x,y,z\}$ and should receive similar weights under Axiom \ref{['axi:sym']}. Note however that the Lebesgue measure, i.e., the area, of the intersection between the red and the green ball differs from that of the intersection between the red and the blue ball, hence $g_r(S)(y) \neq g_r(S)(z).$
  • ...and 3 more figures

Theorems & Definitions (21)

  • Definition 1: Weighting functions of $(E, d)$
  • Example : Diverging Individual Weights.
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4: Naive Monte-Carlo Estimation for $f_\nu$
  • Lemma 1: From christensen_measures_1970
  • Theorem 5: From federer_geometric_1996, Thm 3.2.39
  • Theorem \ref{thm:local_vote_rep_func}
  • Theorem \ref{thm:cont_convex_combi_gr}
  • ...and 11 more