Revisiting Score Function Estimators for $k$-Subset Sampling
Klas Wijk, Ricardo Vinuesa, Hossein Azizpour
TL;DR
The paper tackles learning with $k$-subset sampling, where differentiable parametrization is challenging. It revisits score function estimators (SFESS) to provide exact samples and unbiased gradient estimates even when downstream models are non-differentiable, contrasting with existing relaxed or pathwise approaches. A key contribution is an efficient, $ ext{DFT}$-based computation of the Poisson-binomial normalization needed for the score function, paired with control variates to drastically reduce variance. Empirical results on feature selection tasks show SFESS-V is competitive with or surpasses strong baselines, validating the method's unbiasedness and practical applicability to discrete subset problems. Overall, SFESS offers a principled, complementary tool for discrete subset optimization that broadens the applicability of gradient-based learning to non-differentiable downstream components.
Abstract
Are score function estimators an underestimated approach to learning with $k$-subset sampling? Sampling $k$-subsets is a fundamental operation in many machine learning tasks that is not amenable to differentiable parametrization, impeding gradient-based optimization. Prior work has focused on relaxed sampling or pathwise gradient estimators. Inspired by the success of score function estimators in variational inference and reinforcement learning, we revisit them within the context of $k$-subset sampling. Specifically, we demonstrate how to efficiently compute the $k$-subset distribution's score function using a discrete Fourier transform, and reduce the estimator's variance with control variates. The resulting estimator provides both exact samples and unbiased gradient estimates while also applying to non-differentiable downstream models, unlike existing methods. Experiments in feature selection show results competitive with current methods, despite weaker assumptions.
