Table of Contents
Fetching ...

Efficient Data Subset Selection to Generalize Training Across Models: Transductive and Inductive Networks

Eeshaan Jain, Tushar Nandy, Gaurav Aggarwal, Ashish Tendulkar, Rishabh Iyer, Abir De

TL;DR

This work proposes a trainable subset selection framework, that generalizes across architectures, and introduces an attention-based neural gadget that leverages the graph structure of architectures and acts as a surrogate to trained deep neural networks for quick model prediction.

Abstract

Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches which lack generalizability. For an unseen architecture, one cannot use the subset chosen for a different model. To tackle this problem, we propose $\texttt{SubSelNet}$, a trainable subset selection framework, that generalizes across architectures. Here, we first introduce an attention-based neural gadget that leverages the graph structure of architectures and acts as a surrogate to trained deep neural networks for quick model prediction. Then, we use these predictions to build subset samplers. This naturally provides us two variants of $\texttt{SubSelNet}$. The first variant is transductive (called as Transductive-$\texttt{SubSelNet}$) which computes the subset separately for each model by solving a small optimization problem. Such an optimization is still super fast, thanks to the replacement of explicit model training by the model approximator. The second variant is inductive (called as Inductive-$\texttt{SubSelNet}$) which computes the subset using a trained subset selector, without any optimization. Our experiments show that our model outperforms several methods across several real datasets

Efficient Data Subset Selection to Generalize Training Across Models: Transductive and Inductive Networks

TL;DR

This work proposes a trainable subset selection framework, that generalizes across architectures, and introduces an attention-based neural gadget that leverages the graph structure of architectures and acts as a surrogate to trained deep neural networks for quick model prediction.

Abstract

Existing subset selection methods for efficient learning predominantly employ discrete combinatorial and model-specific approaches which lack generalizability. For an unseen architecture, one cannot use the subset chosen for a different model. To tackle this problem, we propose , a trainable subset selection framework, that generalizes across architectures. Here, we first introduce an attention-based neural gadget that leverages the graph structure of architectures and acts as a surrogate to trained deep neural networks for quick model prediction. Then, we use these predictions to build subset samplers. This naturally provides us two variants of . The first variant is transductive (called as Transductive-) which computes the subset separately for each model by solving a small optimization problem. Such an optimization is still super fast, thanks to the replacement of explicit model training by the model approximator. The second variant is inductive (called as Inductive-) which computes the subset using a trained subset selector, without any optimization. Our experiments show that our model outperforms several methods across several real datasets
Paper Structure (35 sections, 12 equations, 10 figures, 11 tables, 3 algorithms)

This paper contains 35 sections, 12 equations, 10 figures, 11 tables, 3 algorithms.

Figures (10)

  • Figure 1: Illustration of SubSelNet. (a) Overview: Given a model architecture $m\in \mathcal{M}$, SubSelNet takes its graph $G_m$ as input to the architecture encoder $\mathrm{GNN}_\alpha$ to compute the architecture embedding. This, together with $\bm{x}$ is fed into the model approximator $g_{\beta}$ which predicts the output of the trained model $m_{\theta^*}(\bm{x})$. Then this is fed as input to the subset sampler $\pi$ to obtain the training subset $S$. (b) Neural architecture of different components: $\mathrm{GNN}_{\alpha}$ consists of recursive message passing layer. The model approximator $g_{\beta}$ performs a BFS ordering on the emebddings ${\bm{H}}_m = \left\{\bm{h}_u\right\}$ and feeds them into a transformer. Subset sampler optimizes for $\pi$ either via direct optimization for $\pi$ (Transductive) or via a neural network $\pi_{\psi}$ (Inductive).
  • Figure 2: Trade-off between RAR (lower is better) and speedup (top row) and RAR and memory consumption in GB-min (bottom row) for the non-adaptive methods -- Facility location fujishige2005submodulariyer2015submodular, Pruning pruning, Selection-via-Proxy coleman2019selection on all five datasets - FMNIST, CIFAR10 CIFAR100, Tiny-ImageNet and Caltech-256. In all cases, we vary $|S|=b \in (0.005|D|, { \color{black} {0.9}}|D|)$.
  • Figure 3: Trade-off between RAR (lower is better) and speedup (top row) and RAR and memory consumption in GB-min (bottom row) for the adaptive methods -- Glister killamsetty2020glister, Grad-Match killamsetty2021grad, EL2N el2n; GraNd el2n on all five datasets - FMNIST, CIFAR10 CIFAR100, Tiny-ImageNet and Caltech-256. In all cases, we vary $|S|=b \in (0.005|D|, { \color{black} {0.9}}|D|)$.
  • Figure 4: Hybrid-SubSelNet
  • Figure 5: Distribution of parameters of architectures in $\mathcal{M}_{\text{tr}}$ when $|\mathcal{M}_{\text{tr}}| = 423k$ (blue), and $\mathcal{M}_{\text{tr}}$
  • ...and 5 more figures