Weighted-Sum of Gaussian Process Latent Variable Models
James Odgers, Ruby Sedgwick, Chrysoula Kappatou, Ruth Misener, Sarah Filippi
TL;DR
This work addresses signal separation when observations arise from a weighted sum of unknown, non-parametrically varying pure component signals. It introduces Weighted-Sum GPLVM (WS-GPLVM), a Bayesian non-parametric model that places GP priors on each pure component $f_c$, with latent factors $h_i$ capturing unobserved conditions and mixture weights $r_{i\cdot}$ constrained by $\sum_c r_{ic}=1$, using inducing points and variational inference to derive an ELBO for posterior estimation. Key contributions include the formulation of the WS-GPLVM and WS-GPLVM-ind, a tractable variational framework with an analytically computable ELBO, and a detailed training procedure to encourage balanced use of latent variables and weights; the model is demonstrated on spectroscopy and other domains with compelling results against baseline methods like ILMC, CLS-GP, and PLS. The approach enables flexible, non-linear variation modeling of pure component signals under limited labeled data and can be applied to spectroscopy and broader signal-processing problems where linear mixture assumptions are insufficient.
Abstract
This work develops a Bayesian non-parametric approach to signal separation where the signals may vary according to latent variables. Our key contribution is to augment Gaussian Process Latent Variable Models (GPLVMs) for the case where each data point comprises the weighted sum of a known number of pure component signals, observed across several input locations. Our framework allows arbitrary non-linear variations in the signals while being able to incorporate useful priors for the linear weights, such as summing-to-one. Our contributions are particularly relevant to spectroscopy, where changing conditions may cause the underlying pure component signals to vary from sample to sample. To demonstrate the applicability to both spectroscopy and other domains, we consider several applications: a near-infrared spectroscopy dataset with varying temperatures, a simulated dataset for identifying flow configuration through a pipe, and a dataset for determining the type of rock from its reflectance.
