Revisiting Score Function Estimators for $k$-Subset Sampling

Klas Wijk; Ricardo Vinuesa; Hossein Azizpour

Revisiting Score Function Estimators for $k$-Subset Sampling

Klas Wijk, Ricardo Vinuesa, Hossein Azizpour

TL;DR

The paper tackles learning with $k$-subset sampling, where differentiable parametrization is challenging. It revisits score function estimators (SFESS) to provide exact samples and unbiased gradient estimates even when downstream models are non-differentiable, contrasting with existing relaxed or pathwise approaches. A key contribution is an efficient, $ ext{DFT}$-based computation of the Poisson-binomial normalization needed for the score function, paired with control variates to drastically reduce variance. Empirical results on feature selection tasks show SFESS-V is competitive with or surpasses strong baselines, validating the method's unbiasedness and practical applicability to discrete subset problems. Overall, SFESS offers a principled, complementary tool for discrete subset optimization that broadens the applicability of gradient-based learning to non-differentiable downstream components.

Abstract

Are score function estimators an underestimated approach to learning with $k$-subset sampling? Sampling $k$-subsets is a fundamental operation in many machine learning tasks that is not amenable to differentiable parametrization, impeding gradient-based optimization. Prior work has focused on relaxed sampling or pathwise gradient estimators. Inspired by the success of score function estimators in variational inference and reinforcement learning, we revisit them within the context of $k$-subset sampling. Specifically, we demonstrate how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform, and reduce the estimator's variance with control variates. The resulting estimator provides both exact samples and unbiased gradient estimates while also applying to non-differentiable downstream models, unlike existing methods. Experiments in feature selection show results competitive with current methods, despite weaker assumptions.

Revisiting Score Function Estimators for $k$-Subset Sampling

TL;DR

The paper tackles learning with

-subset sampling, where differentiable parametrization is challenging. It revisits score function estimators (SFESS) to provide exact samples and unbiased gradient estimates even when downstream models are non-differentiable, contrasting with existing relaxed or pathwise approaches. A key contribution is an efficient,

-based computation of the Poisson-binomial normalization needed for the score function, paired with control variates to drastically reduce variance. Empirical results on feature selection tasks show SFESS-V is competitive with or surpasses strong baselines, validating the method's unbiasedness and practical applicability to discrete subset problems. Overall, SFESS offers a principled, complementary tool for discrete subset optimization that broadens the applicability of gradient-based learning to non-differentiable downstream components.

Abstract

Are score function estimators an underestimated approach to learning with

-subset sampling? Sampling

-subsets is a fundamental operation in many machine learning tasks that is not amenable to differentiable parametrization, impeding gradient-based optimization. Prior work has focused on relaxed sampling or pathwise gradient estimators. Inspired by the success of score function estimators in variational inference and reinforcement learning, we revisit them within the context of

-subset sampling. Specifically, we demonstrate how to efficiently compute the

-subset distribution's score function using a discrete Fourier transform, and reduce the estimator's variance with control variates. The resulting estimator provides both exact samples and unbiased gradient estimates while also applying to non-differentiable downstream models, unlike existing methods. Experiments in feature selection show results competitive with current methods, despite weaker assumptions.

Paper Structure (20 sections, 7 equations, 4 figures, 4 tables)

This paper contains 20 sections, 7 equations, 4 figures, 4 tables.

Introduction
Method
Efficiently computing the score function
Reducing variance with control variates
Related Work
Other methods
Experiments
Datasets
Baselines
Discussion
Conclusion
Details of Experiments
Network Architecture
Initialization
Optimization and Hyperparameters
...and 5 more sections

Figures (4)

Figure 1: Learning by sampling. Three prominent approaches to learning by sampling: (a) score function estimator, (b) pathwise gradient estimator, and (c) relaxed sampling. We propose a score function estimator for $k$-subset distributions and compare it against existing methods based on approximate pathwise derivatives and relaxed sampling. Because it does not use the pathwise gradient, it is applicable in cases when $f$ is non-differentiable.
Figure 2: Convergence plots. Convergence of training metrics with $k = 30$ selections. The top row shows reconstruction PSNR and the bottom row classification accuracy. The confidence intervals show two standard deviations computed for 5 repeated runs with different random seeds. See \ref{['app:converge']} for the corresponding plot with validation data.
Figure 3: Reconstruction plots. For each dataset, the first row shows the learned selection mask and the following rows different samples from the test data. The leftmost column (a) shows the ground truth images and the following (b--e) show reconstructions from the jointly trained decoder. From top to bottom, left to right the datasets shown are MNIST, Fashion MNIST, and KMNIST.
Figure 4: Convergence plots (validation). Convergence of validation metrics with $k = 30$ selections. The top row shows reconstruction PSNR and the bottom row classification accuracy. The confidence intervals show two standard deviations computed for 5 repeated runs with different random seeds.

Revisiting Score Function Estimators for $k$-Subset Sampling

TL;DR

Abstract

Revisiting Score Function Estimators for $k$-Subset Sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (4)