Table of Contents
Fetching ...

Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains

Yusuke Tanaka, Toshiyuki Tanaka, Tomoharu Iwata, Takeshi Kurashima, Maya Okawa, Yasunori Akagi, Hiroyuki Toda

TL;DR

A multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains.

Abstract

Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities. In the proposed model, the function for each attribute is assumed to be a dependent GP modeled as a linear mixing of independent latent GPs. We design an observation model with an aggregation process for each attribute; the process is an integral of the GP over the corresponding support. We also introduce a prior distribution of the mixing weights, which allows a knowledge transfer across domains (e.g., cities) by sharing the prior. This is advantageous in such a situation where the spatially aggregated dataset in a city is too coarse to interpolate; the proposed model can still make accurate predictions of attributes by utilizing aggregate datasets in other cities. The inference of the proposed model is based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains. The experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets: Time series of air pollutants in Beijing and various kinds of spatial datasets from New York City and Chicago.

Aggregated Multi-output Gaussian Processes with Knowledge Transfer Across Domains

TL;DR

A multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains.

Abstract

Aggregate data often appear in various fields such as socio-economics and public security. The aggregate data are associated not with points but with supports (e.g., spatial regions in a city). Since the supports may have various granularities depending on attributes (e.g., poverty rate and crime rate), modeling such data is not straightforward. This article offers a multi-output Gaussian process (MoGP) model that infers functions for attributes using multiple aggregate datasets of respective granularities. In the proposed model, the function for each attribute is assumed to be a dependent GP modeled as a linear mixing of independent latent GPs. We design an observation model with an aggregation process for each attribute; the process is an integral of the GP over the corresponding support. We also introduce a prior distribution of the mixing weights, which allows a knowledge transfer across domains (e.g., cities) by sharing the prior. This is advantageous in such a situation where the spatially aggregated dataset in a city is too coarse to interpolate; the proposed model can still make accurate predictions of attributes by utilizing aggregate datasets in other cities. The inference of the proposed model is based on variational Bayes, which enables one to learn the model parameters using the aggregate datasets from multiple domains. The experiments demonstrate that the proposed model outperforms in the task of refining coarse-grained aggregate data on real-world datasets: Time series of air pollutants in Beijing and various kinds of spatial datasets from New York City and Chicago.
Paper Structure (12 sections, 25 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 12 sections, 25 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: The problem setting when using aggregate datasets defined on two-dimensional domains. Darker hues represent higher attribute values. Assume that we obtain aggregate datasets in multiple domains, where each attribute value is given by aggregating point-referenced data over the corresponding support. Note that we do not use point-referenced data in either training or test phases. The goal is to infer a function for the attribute using aggregate datasets in multiple domains.
  • Figure 2: Schematic diagram of A-MoGP: Generative process of multiple aggregated attributes in two spatial domains. Covariance functions and prior distributions of weight parameters are shared among domains.
  • Figure 3: MAPE and standard errors for the prediction of fine-grained aggregated time-series datasets. Each row shows the results for each of the monitoring stations, and each column shows the results for each of the pollutants.
  • Figure 4: Prediction result of A-GP for the attributes in Aotizhongxin. In the first row, the black and red lines are the true and predicted values, respectively; the red shaded area denotes twice the standard deviation in prediction at each fine-grained bin. The blue and red lines in the second row are the training data and its prediction, respectively; the predictive variance is calculated on a continuous timeline.
  • Figure 5: Prediction result of SLFM for the attributes in Aotizhongxin. Further figure details are the same as Figure \ref{['fig:vis_A-GP']}.
  • ...and 7 more figures