Using Images to Find Context-Independent Word Representations in Vector Space
Harsh Kumar
TL;DR
The paper addresses the dependence of traditional word embeddings on textual context by proposing a definition- and image-based pipeline to derive context-independent word vectors. Each word is represented by a $3200$-dimensional vector formed by concatenating latent representations of $100$ images (five images per definition term across up to $19$ terms) processed by a $32$-dimensional auto-encoder. The authors construct a large custom dataset of $1{,}15{,}458$ terms and $5{,}77{,}290$ images, sourcing definitions from dictionaries and images from web data, and evaluate on word similarity, outlier detection, and concept categorization benchmarks. They find their method achieves comparable performance to context-based models while significantly reducing training time, with future work exploring machine translation and cross-language image use.
Abstract
Many methods have been proposed to find vector representation for words, but most rely on capturing context from the text to find semantic relationships between these vectors. We propose a novel method of using dictionary meanings and image depictions to find word vectors independent of any context. We use auto-encoder on the word images to find meaningful representations and use them to calculate the word vectors. We finally evaluate our method on word similarity, concept categorization and outlier detection tasks. Our method performs comparably to context-based methods while taking much less training time.
