Table of Contents
Fetching ...

Indirectly Parameterized Concrete Autoencoders

Alfred Nilsson, Klas Wijk, Sai bharath chandra Gutha, Erik Englesson, Alexandra Hotti, Carlo Saccardi, Oskar Kviman, Jens Lagergren, Ricardo Vinuesa, Hossein Azizpour

TL;DR

The paper tackles instability and redundant feature selections in embedded feature selection with Concrete Autoencoders (CAEs). It introduces Indirectly Parameterized CAEs (IP-CAEs), which replace direct logits with a learnable embedding $\boldsymbol{\psi}$ and a linear transform $\boldsymbol{W}$ to generate $\log\boldsymbol{\alpha}$ for the Gumbel-Softmax selectors, stabilizing training and speeding convergence. IP-CAE yields state-of-the-art reconstruction and classification performance across diverse datasets and decoder architectures, aided by a gradient-transform mechanism $\boldsymbol{W}\boldsymbol{W}^T$ and optional diversity regularization via Generalized Jensen-Shannon Divergence. The approach is generalizable to other Gumbel-Softmax distributions and requires no retraining of the downstream decoder, making it a practical and scalable solution for embedded feature selection. Regularization with $D_{GJS}$ provides an explicit diversity baseline, which improves CAE but does not match IP-CAE’s overall gains, highlighting the effectiveness of indirect parametrization.

Abstract

Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.

Indirectly Parameterized Concrete Autoencoders

TL;DR

The paper tackles instability and redundant feature selections in embedded feature selection with Concrete Autoencoders (CAEs). It introduces Indirectly Parameterized CAEs (IP-CAEs), which replace direct logits with a learnable embedding and a linear transform to generate for the Gumbel-Softmax selectors, stabilizing training and speeding convergence. IP-CAE yields state-of-the-art reconstruction and classification performance across diverse datasets and decoder architectures, aided by a gradient-transform mechanism and optional diversity regularization via Generalized Jensen-Shannon Divergence. The approach is generalizable to other Gumbel-Softmax distributions and requires no retraining of the downstream decoder, making it a practical and scalable solution for embedded feature selection. Regularization with provides an explicit diversity baseline, which improves CAE but does not match IP-CAE’s overall gains, highlighting the effectiveness of indirect parametrization.

Abstract

Feature selection is a crucial task in settings where data is high-dimensional or acquiring the full set of features is costly. Recent developments in neural network-based embedded feature selection show promising results across a wide range of applications. Concrete Autoencoders (CAEs), considered state-of-the-art in embedded feature selection, may struggle to achieve stable joint optimization, hurting their training time and generalization. In this work, we identify that this instability is correlated with the CAE learning duplicate selections. To remedy this, we propose a simple and effective improvement: Indirectly Parameterized CAEs (IP-CAEs). IP-CAEs learn an embedding and a mapping from it to the Gumbel-Softmax distributions' parameters. Despite being simple to implement, IP-CAE exhibits significant and consistent improvements over CAE in both generalization and training time across several datasets for reconstruction and classification. Unlike CAE, IP-CAE effectively leverages non-linear relationships and does not require retraining the jointly optimized decoder. Furthermore, our approach is, in principle, generalizable to Gumbel-Softmax distributions beyond feature selection.
Paper Structure (32 sections, 19 equations, 10 figures, 7 tables)

This paper contains 32 sections, 19 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Training Instability. For most datasets, the architecture exhibits a large spike in reconstruction error that consistently correlates with the unique percentage (definition \ref{['def:unique-percentage']}).
  • Figure 2: Architecture. An overview of the architecture, showcasing . Instead of directly learning $\alpha$, we propose to learn an embedding $\psi$ and a transformation $g_\phi$ that output $\alpha$.
  • Figure 3: IP parametrizations. Validation results for CAE compared against three parametrizations of the linear IP weights on the ISOLET dataset.
  • Figure 4: Varying P. Test set performance on ISOLET for varying size of $P$. The mean reconstruction error with CAE is included as a horizontal line.
  • Figure 5: Training Comparison. Comparisons CAE and IP-CAE for (a) reconstruction error, (b) accuracy on the validation data throughout training. For IP-CAE, we let $P = D$. The mean unique percentages (definition \ref{['def:unique-percentage']}) is shown by the dotted lines.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 2.1: Unique Percentage
  • Definition 2.2: Generalized Jensen--Shannon Div.