Table of Contents
Fetching ...

Submodular Information Selection for Hypothesis Testing with Misclassification Penalties

Jayanth Bhargav, Mahsa Ghasemi, Shreyas Sundaram

TL;DR

Submodular Information Selection for Hypothesis Testing with Misclassification Penalties develops a principled framework for selecting a subset of information sources to identify the true hypothesis under misclassification penalties within a centralized Bayesian setting. It introduces two main problems, MCIS and MPIS, and proves that their objective/constraint structures are weakly submodular, enabling near-optimal greedy guarantees; it also proposes a fully submodular alternate metric (total-penalty) for stronger guarantees. The authors provide finite-sample convergence results for Bayesian beliefs and validate the theory with extensive simulations on a 10-class aerial vehicle classification task, demonstrating robust, near-optimal performance of greedy information-set selection under budget and cost constraints. The work offers practical methods for robust, resource-aware information acquisition in autonomous systems, with implications for sensor/feature selection and hypothesis testing under asymmetric penalties.

Abstract

We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables nonuniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis is below a desired bound and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under certain assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.

Submodular Information Selection for Hypothesis Testing with Misclassification Penalties

TL;DR

Submodular Information Selection for Hypothesis Testing with Misclassification Penalties develops a principled framework for selecting a subset of information sources to identify the true hypothesis under misclassification penalties within a centralized Bayesian setting. It introduces two main problems, MCIS and MPIS, and proves that their objective/constraint structures are weakly submodular, enabling near-optimal greedy guarantees; it also proposes a fully submodular alternate metric (total-penalty) for stronger guarantees. The authors provide finite-sample convergence results for Bayesian beliefs and validate the theory with extensive simulations on a 10-class aerial vehicle classification task, demonstrating robust, near-optimal performance of greedy information-set selection under budget and cost constraints. The work offers practical methods for robust, resource-aware information acquisition in autonomous systems, with implications for sensor/feature selection and hypothesis testing under asymmetric penalties.

Abstract

We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables nonuniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis is below a desired bound and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under certain assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.
Paper Structure (25 sections, 17 theorems, 49 equations, 4 figures, 2 algorithms)

This paper contains 25 sections, 17 theorems, 49 equations, 4 figures, 2 algorithms.

Key Result

Theorem 2

Let the true state of the world be $\theta_p$ and let $\mu_0 (\theta) = \frac{1}{m} \space \forall \theta \in \Theta$ (i.e., uniform prior). Under Assumption 1, for any $\delta, \epsilon \in [0,1]$, and L as defined in Equation (eq:kld_ratio), and for an information set $\mathcal{I} \subseteq \mathc where $K(\theta_p, \theta_q) = D_{KL}(\ell_{\mathcal{I}}(\cdot | \theta_p) || \ell_{\mathcal{I}}(\c

Figures (4)

  • Figure 1: (a) Penalty Matrix for the Aerial Vehicle Classification (AVC) task, (b) Performance of Algorithm \ref{['alg:greedy']} (for Problem \ref{['prob:dsrc']}), (c) Performance of Algorithm \ref{['alg:greedy2']} (for Problem \ref{['prob:mpis']}).
  • Figure 2: Finite Sample Convergence of Bayesian Beliefs
  • Figure 3: Performance of greedy algorithm for information set selection for varying submodularity ratios (a) Algorithm \ref{['alg:greedy']} (Problem \ref{['prob:dsrc']}), (b) Algorithm \ref{['alg:greedy2']} (Problem \ref{['prob:mpis']}).
  • Figure 4: Performance of Greedy (a) Algorithm \ref{['alg:greedy']} (Problem \ref{['prob:mmcis']}), (b) Algorithm \ref{['alg:greedy2']} (Problem \ref{['prob:mmpis']}).

Theorems & Definitions (22)

  • Definition 1: Observationally Equivalent Set
  • Theorem 2
  • Corollary 3
  • Definition 4: Monotonicity
  • Definition 5: Submodularity Ratio
  • Remark 6
  • Lemma 7
  • Lemma 8
  • Theorem 9
  • Corollary 10
  • ...and 12 more