Table of Contents
Fetching ...

Using Images to Find Context-Independent Word Representations in Vector Space

Harsh Kumar

TL;DR

The paper addresses the dependence of traditional word embeddings on textual context by proposing a definition- and image-based pipeline to derive context-independent word vectors. Each word is represented by a $3200$-dimensional vector formed by concatenating latent representations of $100$ images (five images per definition term across up to $19$ terms) processed by a $32$-dimensional auto-encoder. The authors construct a large custom dataset of $1{,}15{,}458$ terms and $5{,}77{,}290$ images, sourcing definitions from dictionaries and images from web data, and evaluate on word similarity, outlier detection, and concept categorization benchmarks. They find their method achieves comparable performance to context-based models while significantly reducing training time, with future work exploring machine translation and cross-language image use.

Abstract

Many methods have been proposed to find vector representation for words, but most rely on capturing context from the text to find semantic relationships between these vectors. We propose a novel method of using dictionary meanings and image depictions to find word vectors independent of any context. We use auto-encoder on the word images to find meaningful representations and use them to calculate the word vectors. We finally evaluate our method on word similarity, concept categorization and outlier detection tasks. Our method performs comparably to context-based methods while taking much less training time.

Using Images to Find Context-Independent Word Representations in Vector Space

TL;DR

The paper addresses the dependence of traditional word embeddings on textual context by proposing a definition- and image-based pipeline to derive context-independent word vectors. Each word is represented by a -dimensional vector formed by concatenating latent representations of images (five images per definition term across up to terms) processed by a -dimensional auto-encoder. The authors construct a large custom dataset of terms and images, sourcing definitions from dictionaries and images from web data, and evaluate on word similarity, outlier detection, and concept categorization benchmarks. They find their method achieves comparable performance to context-based models while significantly reducing training time, with future work exploring machine translation and cross-language image use.

Abstract

Many methods have been proposed to find vector representation for words, but most rely on capturing context from the text to find semantic relationships between these vectors. We propose a novel method of using dictionary meanings and image depictions to find word vectors independent of any context. We use auto-encoder on the word images to find meaningful representations and use them to calculate the word vectors. We finally evaluate our method on word similarity, concept categorization and outlier detection tasks. Our method performs comparably to context-based methods while taking much less training time.

Paper Structure

This paper contains 13 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Creation of custom dataset. The images of the original word and the definition terms form the image-set of the given word. The subscripts indicate the sequence.
  • Figure 2: The Auto-encoder architecture used in our method to get the latent representation for images. The top subscripts show the number of input channels whereas the bottom subscripts show the number of output channels. Each conv and convT layer has 3x3 kernel with stride = 1 and padding = 1
  • Figure 3: The auto-encoder is used to get the latent representation of images. These latent representations for definition terms are appended in a sequence to get the final representation of the original word