Table of Contents
Fetching ...

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

Robyn Speer, Joshua Chin, Catherine Havasi

TL;DR

ConceptNet 5.5 presents a large multilingual knowledge graph designed to complement distributional word embeddings. It introduces ConceptNet-PPMI and ConceptNet Numberbatch, a hybrid embedding space produced via retrofitting and cross-lingual alignment of word2vec and GloVe with ConceptNet. The evaluations show state-of-the-art word relatedness results and competitive SAT-style analogy performance, with modest results on the Story Cloze task, illustrating the value and limits of integrating relational knowledge into embeddings. The work provides open code and data resources for building, evaluating, and extending multilingual, knowledge-grounded embeddings.

Abstract

Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected from many sources that include expert-created resources, crowd-sourcing, and games with a purpose. It is designed to represent the general knowledge involved in understanding language, improving natural language applications by allowing the application to better understand the meanings behind the words people use. When ConceptNet is combined with word embeddings acquired from distributional semantics (such as word2vec), it provides applications with understanding that they would not acquire from distributional semantics alone, nor from narrower resources such as WordNet or DBPedia. We demonstrate this with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

TL;DR

ConceptNet 5.5 presents a large multilingual knowledge graph designed to complement distributional word embeddings. It introduces ConceptNet-PPMI and ConceptNet Numberbatch, a hybrid embedding space produced via retrofitting and cross-lingual alignment of word2vec and GloVe with ConceptNet. The evaluations show state-of-the-art word relatedness results and competitive SAT-style analogy performance, with modest results on the Story Cloze task, illustrating the value and limits of integrating relational knowledge into embeddings. The work provides open code and data resources for building, evaluating, and extending multilingual, knowledge-grounded embeddings.

Abstract

Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be used with modern NLP techniques such as word embeddings. ConceptNet is a knowledge graph that connects words and phrases of natural language with labeled edges. Its knowledge is collected from many sources that include expert-created resources, crowd-sourcing, and games with a purpose. It is designed to represent the general knowledge involved in understanding language, improving natural language applications by allowing the application to better understand the meanings behind the words people use. When ConceptNet is combined with word embeddings acquired from distributional semantics (such as word2vec), it provides applications with understanding that they would not acquire from distributional semantics alone, nor from narrower resources such as WordNet or DBPedia. We demonstrate this with state-of-the-art results on intrinsic evaluations of word relatedness that translate into improvements on applications of word vectors, including solving SAT-style analogies.

Paper Structure

This paper contains 24 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: ConceptNet's browsable interface (conceptnet.io) shows facts about the English word "bicycle".
  • Figure 2: Performance of word embeddings across multiple evaluations. Error bars show 95% confidence intervals.