Generalized Regression with Conditional GANs
Deddy Jobson, Eddy Hudson
TL;DR
This paper tackles regression on tabular data by reframing it as conditional distribution learning using conditional GANs. It introduces RegresGAN, a CGAN-based framework that learns the conditional distribution $p(y|x)$ by training a generator conditioned on $x$ and a discriminator to distinguish real from generated pairs, effectively minimizing the Jensen-Shannon divergence between $p(y|x)$ and its estimate. The approach relaxes many distributional assumptions inherent in generalized linear models and demonstrates superior performance on synthetic datasets and heavy-tailed real-world datasets, with a publicly released implementation. Ablation studies show training tricks borrowed from vision tasks are often unnecessary for tabular data, suggesting practical advantages for deployment. The work positions CGAN-based regression as a flexible generalization for neural networks, with potential broad impact in domains with unknown or complex error structures.
Abstract
Regression is typically treated as a curve-fitting process where the goal is to fit a prediction function to data. With the help of conditional generative adversarial networks, we propose to solve this age-old problem in a different way; we aim to learn a prediction function whose outputs, when paired with the corresponding inputs, are indistinguishable from feature-label pairs in the training dataset. We show that this approach to regression makes fewer assumptions on the distribution of the data we are fitting to and, therefore, has better representation capabilities. We draw parallels with generalized linear models in statistics and show how our proposal serves as an extension of them to neural networks. We demonstrate the superiority of this new approach to standard regression with experiments on multiple synthetic and publicly available real-world datasets, finding encouraging results, especially with real-world heavy-tailed regression datasets. To make our work more reproducible, we release our source code. Link to repository: https://anonymous.4open.science/r/regressGAN-7B71/
