Table of Contents
Fetching ...

Robustness of Generalized Median Computation for Consensus Learning in Arbitrary Spaces

Andreas Nienkötter, Sandro Vega-Pons, Xiaoyi Jiang

TL;DR

This work establishes that generalized median (GM) computation exhibits a breakdown point of at least 0.5 under metric distances in arbitrary spaces, and it derives tight bounds on GM displacement under various outlier scenarios, including added or replaced objects and weighted GM. It extends the robustness analysis to non-metric distances, providing conditions under which non-metric GM can be fragile, and offers practical insights via proofs, illustrations, and applicability to 3D rotations and ranking averages. The paper also discusses methods to obtain metric distances (kernel-based and function-transform) and provides guidance for practitioners to avoid non-robust distance choices. Overall, the results give fundamental, broadly applicable guarantees and guidance for robust consensus learning in diverse domains.

Abstract

Robustness in terms of outliers is an important topic and has been formally studied for a variety of problems in machine learning and computer vision. Generalized median computation is a special instance of consensus learning and a common approach to finding prototypes. Related research can be found in numerous problem domains with a broad range of applications. So far, however, robustness of generalized median has only been studied in a few specific spaces. To our knowledge, there is no robustness characterization in a general setting, i.e. for arbitrary spaces. We address this open issue in our work. The breakdown point >=0.5 is proved for generalized median with metric distance functions in general. We also study the detailed behavior in case of outliers from different perspectives. In addition, we present robustness results for weighted generalized median computation and non-metric distance functions. Given the importance of robustness, our work contributes to closing a gap in the literature. The presented results have general impact and applicability, e.g. providing deeper understanding of generalized median computation and practical guidance to avoid non-robust computation.

Robustness of Generalized Median Computation for Consensus Learning in Arbitrary Spaces

TL;DR

This work establishes that generalized median (GM) computation exhibits a breakdown point of at least 0.5 under metric distances in arbitrary spaces, and it derives tight bounds on GM displacement under various outlier scenarios, including added or replaced objects and weighted GM. It extends the robustness analysis to non-metric distances, providing conditions under which non-metric GM can be fragile, and offers practical insights via proofs, illustrations, and applicability to 3D rotations and ranking averages. The paper also discusses methods to obtain metric distances (kernel-based and function-transform) and provides guidance for practitioners to avoid non-robust distance choices. Overall, the results give fundamental, broadly applicable guarantees and guidance for robust consensus learning in diverse domains.

Abstract

Robustness in terms of outliers is an important topic and has been formally studied for a variety of problems in machine learning and computer vision. Generalized median computation is a special instance of consensus learning and a common approach to finding prototypes. Related research can be found in numerous problem domains with a broad range of applications. So far, however, robustness of generalized median has only been studied in a few specific spaces. To our knowledge, there is no robustness characterization in a general setting, i.e. for arbitrary spaces. We address this open issue in our work. The breakdown point >=0.5 is proved for generalized median with metric distance functions in general. We also study the detailed behavior in case of outliers from different perspectives. In addition, we present robustness results for weighted generalized median computation and non-metric distance functions. Given the importance of robustness, our work contributes to closing a gap in the literature. The presented results have general impact and applicability, e.g. providing deeper understanding of generalized median computation and practical guidance to avoid non-robust computation.

Paper Structure

This paper contains 26 sections, 10 theorems, 47 equations, 8 figures.

Key Result

Theorem 1

Let $\mathcal{D}$ be an arbitrary space with a metric $\delta: \mathcal{D} \times \mathcal{D} \rightarrow \mathbb{R}_0^+$, and $O = \{o_1, ..., o_n\}$ be a multi-set in $\mathcal{D}$. Then, the GM $\bar{o}$ of $O$ has a breakdown point $\epsilon^* \geq \lfloor (n+1)/2 \rfloor/n$ with $\lim_{n\righta

Figures (8)

  • Figure 1: Illustration of corrupted set Q, where objects of the original set $O$ are corrupted (here: moved to $P$). The original median object $\bar{o}$ is displaced to $\bar{q}$. In fletcher2009geometric proof, the maximum displacement $\delta(\bar{o},\bar{q})$ is bounded by $2R + \gamma$ (dashed), where $R$ is the maximum distance of any object in $O$ to $\bar{o}$. In discrete spaces we instead show that $\delta(\bar{o}, \bar{q})$ is bounded by $2R + c + \gamma'$ (solid), where $c$ is realized by two objects $a$ and $b$, as an object on the radius $2R$ may not exist.
  • Figure 2: Illustration of the original set $O$ and added objects $P$. The original median object $\bar{o}$ is displaced to $\bar{q}$ by the inclusion of the added objects in $P$. Note that $\bar{o}$ and $\bar{q}$ are not part of the sets but median objects, minimizing $\Omega_O$ and $\Omega_Q$, respectively.
  • Figure 3: $\delta(\bar{o}, \bar{q})$ (solid) compared with the upper bounds resulting from Theorem \ref{['theorem:replaced_bound']} (ours, dashed) and Eq. (\ref{['eq:riemann_bound']}) (dotted) for normally distributed datasets, depending on the number (left) and displacement (right) of outliers. In left the displacement is 1000, in right $k=50$.
  • Figure 4: The upper bound for $\delta(\bar{o},\bar{q})$ in relation to $n_2$ for the example set shown in Section \ref{['sec:example-set']}. As shown in (\ref{['eq:example_bound']}), this relation holds for all $n_1, k$ with $n_1 > n_2$ and $k = n_1 - n_2 + 1$.
  • Figure 5: Proof of Theorem \ref{['theorem:riemann_bound']}. Illustration of the breakdown of Inequality (\ref{['eq:riemannian_inequality2']}) in the case of discrete spaces. In the above example, objects only lie on a regular grid. In this case, point $a$ in the definition of $\gamma$ is strictly inside of ball $B$. This can lead to the distance $\delta(\bar{q}, o_i)$ being shorter than $R+\gamma$.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6
  • Lemma 1
  • Lemma 2
  • Theorem 7
  • Corollary 1