Table of Contents
Fetching ...

A Closed-Form EVSI Expression for a Multinomial Data-Generating Process

Adam Fleischhacker, Pak-Wing Fok, Mokshay Madiman, Nan Wu

Abstract

This paper derives analytic expressions for the expected value of sample information (EVSI), the expected value of distribution information (EVDI), and the optimal sample size when data consists of independent draws from a bounded sequence of integers. Due to challenges of creating tractable EVSI expressions, most existing work valuing data does so in one of three ways: 1) analytically through closed-form expressions on the upper bound of the value of data, 2) calculating the expected value of data using numerical comparisons of decisions made using simulated data to optimal decisions where the underlying data distribution is known, or 3) using variance reduction as proxy for the uncertainty reduction that accompanies more data. For the very flexible case of modelling integer-valued observations using a multinomial data-generating process with Dirichlet prior, this paper develops expressions that 1) generalize existing beta-Binomial computations, 2) do not require prior knowledge of some underlying "true" distribution, and 3) can be computed prior to the collection of any sample data.

A Closed-Form EVSI Expression for a Multinomial Data-Generating Process

Abstract

This paper derives analytic expressions for the expected value of sample information (EVSI), the expected value of distribution information (EVDI), and the optimal sample size when data consists of independent draws from a bounded sequence of integers. Due to challenges of creating tractable EVSI expressions, most existing work valuing data does so in one of three ways: 1) analytically through closed-form expressions on the upper bound of the value of data, 2) calculating the expected value of data using numerical comparisons of decisions made using simulated data to optimal decisions where the underlying data distribution is known, or 3) using variance reduction as proxy for the uncertainty reduction that accompanies more data. For the very flexible case of modelling integer-valued observations using a multinomial data-generating process with Dirichlet prior, this paper develops expressions that 1) generalize existing beta-Binomial computations, 2) do not require prior knowledge of some underlying "true" distribution, and 3) can be computed prior to the collection of any sample data.
Paper Structure (14 sections, 2 theorems, 45 equations, 2 figures, 1 table)

This paper contains 14 sections, 2 theorems, 45 equations, 2 figures, 1 table.

Key Result

Proposition 2.1

Suppose data distribution $T \equiv (T_0,\ldots,T_M)$ is drawn from a given prior $\pi$. Assume further that a DM is given $n$ samples $X \equiv (X_1,\ldots,X_n)$ and updates his/her prior to the posterior $\pi_X$. Then, under quadratic loss, the expected value of these $n$ samples is non-negative,

Figures (2)

  • Figure 1: Graphical depiction of the Dirichlet prior parameters, potential realizations for that prior (i.e. the multinomial parameters), and the EVSI/EVDI calculations as a function of $n$ samples for the given prior. Top row for concentration parameter $\alpha = 10$ and bottom row for concentration parameter $\alpha = 50$
  • Figure 2: Comparing the sample average approximation(SAA) updating procedure to the known Bayesian (BAYES) optimal updating procedure.

Theorems & Definitions (4)

  • Proposition 2.1
  • proof
  • Theorem 3.1
  • proof