Table of Contents
Fetching ...

Dempster-Shafer P-values: Thoughts on an Alternative Approach for Multinomial Inference

Kentaro Hoffman, Kai Zhang, Tyler McCormick, Jan Hannig

TL;DR

A new measure of evidence called the Dempster-Shafer p-value is demonstrated which allow for insights and interpretations which retain most of the structure of the p-value while covering for some of the disadvantages that traditional p- values face.

Abstract

In this paper, we demonstrate that a new measure of evidence we developed called the Dempster-Shafer p-value which allow for insights and interpretations which retain most of the structure of the p-value while covering for some of the disadvantages that traditional p- values face. Moreover, we show through classical large-sample bounds and simulations that there exists a close connection between our form of DS hypothesis testing and the classical frequentist testing paradigm. We also demonstrate how our approach gives unique insights into the dimensionality of a hypothesis test, as well as models the effects of adversarial attacks on multinomial data. Finally, we demonstrate how these insights can be used to analyze text data for public health through an analysis of the Population Health Metrics Research Consortium dataset for verbal autopsies.

Dempster-Shafer P-values: Thoughts on an Alternative Approach for Multinomial Inference

TL;DR

A new measure of evidence called the Dempster-Shafer p-value is demonstrated which allow for insights and interpretations which retain most of the structure of the p-value while covering for some of the disadvantages that traditional p- values face.

Abstract

In this paper, we demonstrate that a new measure of evidence we developed called the Dempster-Shafer p-value which allow for insights and interpretations which retain most of the structure of the p-value while covering for some of the disadvantages that traditional p- values face. Moreover, we show through classical large-sample bounds and simulations that there exists a close connection between our form of DS hypothesis testing and the classical frequentist testing paradigm. We also demonstrate how our approach gives unique insights into the dimensionality of a hypothesis test, as well as models the effects of adversarial attacks on multinomial data. Finally, we demonstrate how these insights can be used to analyze text data for public health through an analysis of the Population Health Metrics Research Consortium dataset for verbal autopsies.
Paper Structure (16 sections, 5 theorems, 45 equations, 8 figures, 1 table)

This paper contains 16 sections, 5 theorems, 45 equations, 8 figures, 1 table.

Key Result

Theorem 1

Given data from a $k$ dimensional multinomial, $(n_1, ... , n_k)$ the posterior random set $\hat{\mathcal{P}}$ is distributed according to:

Figures (8)

  • Figure 1: A) 100 randomly chosen posterior random sets from a 3 dimensional test of uniformity. Note how the polytopes are centered around the point estimate (red). 100 was chosen for visibility reasons. B) 1000 simulations of the upper and lower test statistic and a vertical line representing $\mathcal{P}_0$. For this data, $\pi_{lower}$ is 0.12 and $\pi_{upper}$ is 0.034.
  • Figure 2: A) 100 randomly chosen posterior random sets. Note how the polytopes are more tightly centered around the point estimate than in Figure \ref{['fig:fig1']}. B) 1000 simulations of the upper and lower test statistic and a vertical line representing $\mathcal{P}_0$. With the larger sample size, both $\pi_{lower}$ and $\pi_{upper}$ are $< 0.0001$.
  • Figure 3: Results of Frequentist and DS hypothesis test of uniformity when the null is true. At low sample sizes, the Frequentist test gives a deceptively low rejection rate while the DS properly indicates the difficulty of this problem by having a large probability for an "Uncertain" result.
  • Figure 4: Results of Frequentist and DS hypothesis test of uniformity when the null is false. Once again, with the frequentist test, at low sample sizes, it is unclear if the alternate is correct or if the sample size is lacking, while the DS makes this clear through the unknown class that the sample size is lacking.
  • Figure 5: Effect of weakening on a DS test of uniformity when the null is false. As the number of adversarial samples (represented by alpha) increases, the conclusions become increasingly more muddled.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Theorem 1
  • Theorem 2
  • proof
  • Theorem 3
  • proof
  • Definition 5.1
  • Example 5.1
  • Theorem 4
  • proof
  • Theorem 5
  • ...and 2 more