Table of Contents
Fetching ...

Implicit Regularization for Multi-label Feature Selection

Dou El Kefel Mansouri, Khalid Benabdeslem, Seif-Eddine Benkabou

TL;DR

This paper addresses the problem of feature selection in the context of multi-label learning, by using a new estimator based on implicit regularization and label embedding via Hadamard product parameterization, which may lead to benign overfitting.

Abstract

In this paper, we address the problem of feature selection in the context of multi-label learning, by using a new estimator based on implicit regularization and label embedding. Unlike the sparse feature selection methods that use a penalized estimator with explicit regularization terms such as $l_{2,1}$-norm, MCP or SCAD, we propose a simple alternative method via Hadamard product parameterization. In order to guide the feature selection process, a latent semantic of multi-label information method is adopted, as a label embedding. Experimental results on some known benchmark datasets suggest that the proposed estimator suffers much less from extra bias, and may lead to benign overfitting.

Implicit Regularization for Multi-label Feature Selection

TL;DR

This paper addresses the problem of feature selection in the context of multi-label learning, by using a new estimator based on implicit regularization and label embedding via Hadamard product parameterization, which may lead to benign overfitting.

Abstract

In this paper, we address the problem of feature selection in the context of multi-label learning, by using a new estimator based on implicit regularization and label embedding. Unlike the sparse feature selection methods that use a penalized estimator with explicit regularization terms such as -norm, MCP or SCAD, we propose a simple alternative method via Hadamard product parameterization. In order to guide the feature selection process, a latent semantic of multi-label information method is adopted, as a label embedding. Experimental results on some known benchmark datasets suggest that the proposed estimator suffers much less from extra bias, and may lead to benign overfitting.

Paper Structure

This paper contains 25 sections, 4 theorems, 16 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Lemma 3.1

Based on hoff2017lasso, a change-of-variable via Hadamard product parametrization ($\mathbf{W} = \mathbf{G} \odot \mathbf{H}$), makes the non-smooth convex optimization problem for $\Xi$ in Eq. (eqimplicit) a smoothed optimization problem ($\widehat{\Xi}$ in Eq. (eqimplicit)).

Figures (7)

  • Figure 1: Comparison of mFSIR against other methods with the Nemenyi test.
  • Figure 2: Influence of selected feature number on four datasets emotions, language log, tmc2007 and Yeast.
  • Figure 3: Convergence curves of mFSIR and MIFS on four datasets emotions, language log, tmc2007 and Yeast.
  • Figure 4: Different sparsity behaviors on bibtex dataset. (a) represents the matrix Ĝ initialized by values of $\mathbf{G}$ superior or equal to zero. The subfigure is clearly sparse with columns containing the value zero (blue color). (b) represents the matrix Ĝ initialized by random values of $\mathbf{G}$. The subfigure is clearly not sparse since the columns contain different colors.
  • Figure 5: Performance of mFSIR changes with varying hyper-parameter configurations $\alpha$ and $\beta$ from $\{10^{-3}, 10^{-2}, 10^{-1}, 1, 10, 10^{2}\}$. Dataset: emotions; First and second row: hamming loss, ranking loss and macro-averaging AUC v.s. regularization parameter $\alpha$ and percentage of selected features. Second and third row: hamming loss, ranking loss and macro-averaging AUC v.s. regularization parameter $\beta$ and percentage of selected features.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4