Table of Contents
Fetching ...

Hardness and Approximability of Dimension Reduction on the Probability Simplex

Roberto Bruno

TL;DR

This paper studies dimensionality reduction for $n$-dimensional probability distributions on the simplex by aggregating to an $m$-dimensional distribution $q$ to minimize $D(q \Vert p)$. It proves the optimization is strongly NP-hard via a polynomial-time reduction from the 3-Partition problem and constructs a specific $p$ such that the optimum equals $\log\frac{m+1}{m}$ iff the 3-Partition instance is solvable. It then provides an approximation algorithm, GreedyApprox, which yields an aggregation $\overline{q} \in \mathcal{A}_m(p)$ with $D(\overline{q} \Vert p) < OPT + 1$ and runs in $O(n \log m)$ time, using a bin-packing interpretation with capacities given by $lb(p)$. The results connect to the minimum cross-entropy principle and highlight both the hardness of dimension reduction on the simplex and a practical near-optimal method, with open questions about multiplicative guarantees and extensions to other divergence measures.

Abstract

Dimension reduction is a technique used to transform data from a high-dimensional space into a lower-dimensional space, aiming to retain as much of the original information as possible. This approach is crucial in many disciplines like engineering, biology, astronomy, and economics. In this paper, we consider the following dimensionality reduction instance: Given an n-dimensional probability distribution p and an integer m<n, we aim to find the m-dimensional probability distribution q that is the closest to p, using the Kullback-Leibler divergence as the measure of closeness. We prove that the problem is strongly NP-hard, and we present an approximation algorithm for it.

Hardness and Approximability of Dimension Reduction on the Probability Simplex

TL;DR

This paper studies dimensionality reduction for -dimensional probability distributions on the simplex by aggregating to an -dimensional distribution to minimize . It proves the optimization is strongly NP-hard via a polynomial-time reduction from the 3-Partition problem and constructs a specific such that the optimum equals iff the 3-Partition instance is solvable. It then provides an approximation algorithm, GreedyApprox, which yields an aggregation with and runs in time, using a bin-packing interpretation with capacities given by . The results connect to the minimum cross-entropy principle and highlight both the hardness of dimension reduction on the simplex and a practical near-optimal method, with open questions about multiplicative guarantees and extensions to other divergence measures.

Abstract

Dimension reduction is a technique used to transform data from a high-dimensional space into a lower-dimensional space, aiming to retain as much of the original information as possible. This approach is crucial in many disciplines like engineering, biology, astronomy, and economics. In this paper, we consider the following dimensionality reduction instance: Given an n-dimensional probability distribution p and an integer m<n, we aim to find the m-dimensional probability distribution q that is the closest to p, using the Kullback-Leibler divergence as the measure of closeness. We prove that the problem is strongly NP-hard, and we present an approximation algorithm for it.
Paper Structure (5 sections, 5 theorems, 17 equations, 1 algorithm)

This paper contains 5 sections, 5 theorems, 17 equations, 1 algorithm.

Key Result

Lemma 1

For each $p\in \mathcal{P}_n$ and $q\in \mathcal{P}_m$, $m<n$, it holds that where

Theorems & Definitions (5)

  • Lemma 1
  • Theorem 1
  • Lemma 2
  • Lemma 3
  • Theorem 2