Table of Contents
Fetching ...

Question Answering with Subgraph Embeddings

Antoine Bordes, Sumit Chopra, Jason Weston

TL;DR

The paper addresses open-domain question answering over large knowledge bases by learning a joint embedding space for words and KB symbols and scoring questions against candidate answers with S(q,a) = f(q)^{T} g(a), where f(q) = W φ(q) and g(a) = W ψ(a). It introduces rich answer representations (single entities, paths, and subgraphs) and an efficient inference procedure that can handle longer KB paths, trained with a margin-based ranking loss and multitask signals from paraphrase data and entity-name mappings. Key contributions include a sophisticated inference mechanism, a subgraph-based answer representation that improves matching, and competitive results on WebQuestions without hand-crafted lexicons or parsing. The approach leverages multiple data sources (WebQuestions, Freebase, ClueWeb extractions) and paraphrase supervision to achieve scalability and robustness across KB schemas, making it applicable to open QA in diverse domains.

Abstract

This paper presents a system which learns to answer questions on a broad range of topics from a knowledge base using few hand-crafted features. Our model learns low-dimensional embeddings of words and knowledge base constituents; these representations are used to score natural language questions against candidate answers. Training our system using pairs of questions and structured representations of their answers, and pairs of question paraphrases, yields competitive results on a competitive benchmark of the literature.

Question Answering with Subgraph Embeddings

TL;DR

The paper addresses open-domain question answering over large knowledge bases by learning a joint embedding space for words and KB symbols and scoring questions against candidate answers with S(q,a) = f(q)^{T} g(a), where f(q) = W φ(q) and g(a) = W ψ(a). It introduces rich answer representations (single entities, paths, and subgraphs) and an efficient inference procedure that can handle longer KB paths, trained with a margin-based ranking loss and multitask signals from paraphrase data and entity-name mappings. Key contributions include a sophisticated inference mechanism, a subgraph-based answer representation that improves matching, and competitive results on WebQuestions without hand-crafted lexicons or parsing. The approach leverages multiple data sources (WebQuestions, Freebase, ClueWeb extractions) and paraphrase supervision to achieve scalability and robustness across KB schemas, making it applicable to open QA in diverse domains.

Abstract

This paper presents a system which learns to answer questions on a broad range of topics from a knowledge base using few hand-crafted features. Our model learns low-dimensional embeddings of words and knowledge base constituents; these representations are used to score natural language questions against candidate answers. Training our system using pairs of questions and structured representations of their answers, and pairs of question paraphrases, yields competitive results on a competitive benchmark of the literature.

Paper Structure

This paper contains 14 sections, 3 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Illustration of the subgraph embedding model scoring a candidate answer: (i) locate entity in the question; (ii) compute path from entity to answer; (iii) represent answer as path plus all connected entities to the answer (the subgraph); (iv) embed both the question and the answer subgraph separately using the learnt embedding vectors, and score the match via their dot product.