Table of Contents
Fetching ...

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

Jintang Xue, Yun-Cheng Wang, Chengwei Wei, C. -C. Jay Kuo

TL;DR

An efficient and effective weakly-supervised feature selection method named WordFS, which has two variants, each utilizing novel criteria for feature selection and outperforms other dimension reduction methods at lower computational costs.

Abstract

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper.

Word Embedding Dimension Reduction via Weakly-Supervised Feature Selection

TL;DR

An efficient and effective weakly-supervised feature selection method named WordFS, which has two variants, each utilizing novel criteria for feature selection and outperforms other dimension reduction methods at lower computational costs.

Abstract

As a fundamental task in natural language processing, word embedding converts each word into a representation in a vector space. A challenge with word embedding is that as the vocabulary grows, the vector space's dimension increases, which can lead to a vast model size. Storing and processing word vectors are resource-demanding, especially for mobile edge-devices applications. This paper explores word embedding dimension reduction. To balance computational costs and performance, we propose an efficient and effective weakly-supervised feature selection method named WordFS. It has two variants, each utilizing novel criteria for feature selection. Experiments on various tasks (e.g., word and sentence similarity and binary and multi-class classification) indicate that the proposed WordFS model outperforms other dimension reduction methods at lower computational costs. We have released the code for reproducibility along with the paper.
Paper Structure (17 sections, 6 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 17 sections, 6 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of the proposed WordFS method.
  • Figure 2: Average accuracy comparison for prediction tasks.
  • Figure 3: Accuracy comparison for prediction tasks using our WordFS method.
  • Figure 4: Comparison of the average Spearman's rank correlation coefficients of sentence similarity tasks.
  • Figure 5: Spearman's rank correlation coefficients comparison for similarity tasks using our WordFS method.