Target inductive methods for zero-shot regression
Miriam Fdez-Díaz, José Ramón Quevedo, Elena Montañés
TL;DR
This work addresses predicting pollutant concentrations at meteorological stations where weather data is unavailable for new locations by leveraging station-side information as privileged side information. It introduces two inductive zero-shot regression strategies: a relationship-based method that aggregately transfers knowledge from observed stations using an inverse-distance relationship, and a correspondence-based method (MPLC) that learns mappings from side information to model parameters for unobserved targets. A simple baseline that concatenates features and side information is also evaluated. Across artificial and real-world datasets (Communities and Crime, Air pollution), both proposed methods outperform the baseline, with MPLC often providing the strongest performance, particularly as the number of observed targets grows; results are statistically significant in many settings. The findings advance zero-shot regression by enabling robust generalization to unseen station locations and offer directions for future work in deep learning extensions and multivariate, joint modeling of related pollutants.
Abstract
This research arises from the need to predict the amount of air pollutants in meteorological stations. Air pollution depends on the location of the stations (weather conditions and activities in the surroundings). Frequently, the surrounding information is not considered in the learning process. This information is known beforehand in the absence of unobserved weather conditions and remains constant for the same station. Considering the surrounding information as side information facilitates the generalization for predicting pollutants in new stations, leading to a zero-shot regression scenario. Available methods in zero-shot typically lean towards classification, and are not easily extensible to regression. This paper proposes two zero-shot methods for regression. The first method is a similarity based approach that learns models from features and aggregates them using side information. However, potential knowledge of the feature models may be lost in the aggregation. The second method overcomes this drawback by replacing the aggregation procedure and learning the correspondence between side information and feature-induced models, instead. Both proposals are compared with a baseline procedure using artificial datasets, UCI repository communities and crime datasets, and the pollutants. Both approaches outperform the baseline method, but the parameter learning approach manifests its superiority over the similarity based method.
