Table of Contents
Fetching ...

Incremental Structure Discovery of Classification via Sequential Monte Carlo

Changze Huang, Di Wang

TL;DR

This work tackles online and incremental Gaussian Process classification by automatically discovering kernel structures and parameters with a Bayesian approach. It extends the AutoGP framework to the binary classification setting, using a domain-specific language over kernels (PCFG) and a Sequential Monte Carlo (SMC) inference scheme that jointly evolves kernel structure and hyperparameters. The method supports online data arrival and offline batch processing, employing IMCMC for structure updates and HMC for continuous parameters within a reweight-resample-rejuvenate loop, and demonstrates kernel-structure discovery and competitive accuracy on both toy and real-world datasets. The results indicate improved adaptability to pattern shifts and superior performance relative to baselines in online and offline regimes, highlighting the practical impact of automatic kernel discovery in GP-based classification.

Abstract

Gaussian Processes (GPs) provide a powerful framework for making predictions and understanding uncertainty for classification with kernels and Bayesian non-parametric learning. Building such models typically requires strong prior knowledge to define preselect kernels, which could be ineffective for online applications of classification that sequentially process data because features of data may shift during the process. To alleviate the requirement of prior knowledge used in GPs and learn new features from data that arrive successively, this paper presents a novel method to automatically discover models of classification on complex data with little prior knowledge. Our method adapts a recently proposed technique for GP-based time-series structure discovery, which integrates GPs and Sequential Monte Carlo (SMC). We extend the technique to handle extra latent variables in GP classification, such that our method can effectively and adaptively learn a-priori unknown structures of classification from continuous input. In addition, our method adapts new batch of data with updated structures of models. Our experiments show that our method is able to automatically incorporate various features of kernels on synthesized data and real-world data for classification. In the experiments of real-world data, our method outperforms various classification methods on both online and offline setting achieving a 10\% accuracy improvement on one benchmark.

Incremental Structure Discovery of Classification via Sequential Monte Carlo

TL;DR

This work tackles online and incremental Gaussian Process classification by automatically discovering kernel structures and parameters with a Bayesian approach. It extends the AutoGP framework to the binary classification setting, using a domain-specific language over kernels (PCFG) and a Sequential Monte Carlo (SMC) inference scheme that jointly evolves kernel structure and hyperparameters. The method supports online data arrival and offline batch processing, employing IMCMC for structure updates and HMC for continuous parameters within a reweight-resample-rejuvenate loop, and demonstrates kernel-structure discovery and competitive accuracy on both toy and real-world datasets. The results indicate improved adaptability to pattern shifts and superior performance relative to baselines in online and offline regimes, highlighting the practical impact of automatic kernel discovery in GP-based classification.

Abstract

Gaussian Processes (GPs) provide a powerful framework for making predictions and understanding uncertainty for classification with kernels and Bayesian non-parametric learning. Building such models typically requires strong prior knowledge to define preselect kernels, which could be ineffective for online applications of classification that sequentially process data because features of data may shift during the process. To alleviate the requirement of prior knowledge used in GPs and learn new features from data that arrive successively, this paper presents a novel method to automatically discover models of classification on complex data with little prior knowledge. Our method adapts a recently proposed technique for GP-based time-series structure discovery, which integrates GPs and Sequential Monte Carlo (SMC). We extend the technique to handle extra latent variables in GP classification, such that our method can effectively and adaptively learn a-priori unknown structures of classification from continuous input. In addition, our method adapts new batch of data with updated structures of models. Our experiments show that our method is able to automatically incorporate various features of kernels on synthesized data and real-world data for classification. In the experiments of real-world data, our method outperforms various classification methods on both online and offline setting achieving a 10\% accuracy improvement on one benchmark.
Paper Structure (20 sections, 16 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 16 equations, 4 figures, 1 table, 1 algorithm.

Figures (4)

  • Figure 1: Contour plots of our method with Linear and SquaredExponential GP. From left to right: original data points, the result of Linear GP, the result of SquaredExponential GP, and the result of our method. It shows that our method can discover different kernel structures.
  • Figure 2: Contour plots of different particles after learning a linearly separable dataset. Fig.\ref{['fig:first']} is the result with all particles. Fig.\ref{['fig:second']} is the particle with kernel $\textsc{SquaredExponential} \times (\textsc{Linear} + \textsc{SquaredExponential})$. Fig.\ref{['fig:third']} is the particle with kernel Linear. Fig.\ref{['fig:four']} is the particle with kernel $\textsc{Linear} \times \textsc{Linear}$.
  • Figure 3: Contour plots of our method in the context of patterns shifts. The left figure is the result of initial learning. The right figure is the result of incorporating another batch with the remaining data.
  • Figure : SMC Learning for Classification