Question Answering with Subgraph Embeddings
Antoine Bordes, Sumit Chopra, Jason Weston
TL;DR
The paper addresses open-domain question answering over large knowledge bases by learning a joint embedding space for words and KB symbols and scoring questions against candidate answers with S(q,a) = f(q)^{T} g(a), where f(q) = W φ(q) and g(a) = W ψ(a). It introduces rich answer representations (single entities, paths, and subgraphs) and an efficient inference procedure that can handle longer KB paths, trained with a margin-based ranking loss and multitask signals from paraphrase data and entity-name mappings. Key contributions include a sophisticated inference mechanism, a subgraph-based answer representation that improves matching, and competitive results on WebQuestions without hand-crafted lexicons or parsing. The approach leverages multiple data sources (WebQuestions, Freebase, ClueWeb extractions) and paraphrase supervision to achieve scalability and robustness across KB schemas, making it applicable to open QA in diverse domains.
Abstract
This paper presents a system which learns to answer questions on a broad range of topics from a knowledge base using few hand-crafted features. Our model learns low-dimensional embeddings of words and knowledge base constituents; these representations are used to score natural language questions against candidate answers. Training our system using pairs of questions and structured representations of their answers, and pairs of question paraphrases, yields competitive results on a competitive benchmark of the literature.
