Multiple output samples per input in a single-output Gaussian process
Jeremy H. M. Wong, Huayun Zhang, Nancy F. Chen
TL;DR
This work extends Gaussian Process regression to exploit multiple output samples per input, enabling explicit modeling of inter-rater uncertainty in subjective tasks like spoken language assessment without duplicating latent variables. By introducing a non-redundant latent-variable formulation, the authors derive a joint marginal likelihood that scales as in a standard GP and yield a test posterior that incorporates the empirical mean of multiple raters while decoupling the test covariance from training outputs. Empirical results on speechocean762 show that the proposed GPjoint approach better matches the distribution of raters (lower KL divergence) and offers computational advantages over naive repetition methods. The approach facilitates uncertainty-calibrated feedback in subjective domains and can generalize to other tasks with multi-source labels.
Abstract
The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This paper proposes to generalise the GP to allow for these multiple output samples in the training set, and thus make use of available output uncertainty information. This differs from a multi-output GP, as all output samples are from the same task here. The output density function is formulated to be the joint likelihood of observing all output samples, and latent variables are not repeated to reduce computation cost. The test set predictions are inferred similarly to a standard GP, with a difference being in the optimised hyper-parameters. This is evaluated on speechocean762, showing that it allows the GP to compute a test set output distribution that is more similar to the collection of reference outputs from the multiple human raters.
