Table of Contents
Fetching ...

Transformer-based Parameter Estimation in Statistics

Xiaoxin Yin, David S. Yin

Abstract

Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not exist (e.g., for Beta distribution). In this paper we propose a transformer-based approach to parameter estimation. Compared with existing solutions, our approach does not require a closed-form solution or any mathematical derivations. It does not even require knowing the probability density function, which is needed by numerical methods. After the transformer model is trained, only a single inference is needed to estimate the parameters of the underlying distribution based on a sample of observations. In the empirical study we compared our approach with maximum likelihood estimation on commonly used distributions such as normal distribution, exponential distribution and beta distribution. It is shown that our approach achieves similar or better accuracy as measured by mean-square-errors.

Transformer-based Parameter Estimation in Statistics

Abstract

Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not exist (e.g., for Beta distribution). In this paper we propose a transformer-based approach to parameter estimation. Compared with existing solutions, our approach does not require a closed-form solution or any mathematical derivations. It does not even require knowing the probability density function, which is needed by numerical methods. After the transformer model is trained, only a single inference is needed to estimate the parameters of the underlying distribution based on a sample of observations. In the empirical study we compared our approach with maximum likelihood estimation on commonly used distributions such as normal distribution, exponential distribution and beta distribution. It is shown that our approach achieves similar or better accuracy as measured by mean-square-errors.
Paper Structure (18 sections, 3 equations, 4 figures, 7 tables)

This paper contains 18 sections, 3 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Converting a sample into a sequence of $L$ embeddings ($L=1024$), each of size $K$ ($K=384$). The sample contains 6 observations, ranging from 0 to 1 (after normalization). The left-most observation is mapped to the first dimension of the first embedding. The rightmost observation is mapped to the last dimension of the last embedding. The observation 0.500001 maps to somewhere between the first and second dimensions of the 512th embedding, and thus its weight is distributed between these two dimensions.
  • Figure 2: Architecture of our transformer model
  • Figure 3: (a) The mean-square-error with # training examples for exponential distributions with known parameter ranges. The horizontal lines represent mean-square-errors of MLE, and the curves represent those of our approach. (b) Those for exponential distributions with unknown parameter ranges.
  • Figure 4: The mean-square-error with #training examples for normal distributions with known parameter ranges. The horizontal lines represent mean-square-errors of MLE, and the curves represent those of our approach.