Table of Contents
Fetching ...

EEE-QA: Exploring Effective and Efficient Question-Answer Representations

Zhanghao Hu, Yijun Yang, Junjie Xu, Yifu Qiu, Pinzhen Chen

TL;DR

This work challenges the existing question-answer encoding convention and explores finer representations and opportunities to simultaneously embed all answer candidates with the question to enable cross-reference between answer choices and improve inference throughput via reduced memory usage.

Abstract

Current approaches to question answering rely on pre-trained language models (PLMs) like RoBERTa. This work challenges the existing question-answer encoding convention and explores finer representations. We begin with testing various pooling methods compared to using the begin-of-sentence token as a question representation for better quality. Next, we explore opportunities to simultaneously embed all answer candidates with the question. This enables cross-reference between answer choices and improves inference throughput via reduced memory usage. Despite their simplicity and effectiveness, these methods have yet to be widely studied in current frameworks. We experiment with different PLMs, and with and without the integration of knowledge graphs. Results prove that the memory efficacy of the proposed techniques with little sacrifice in performance. Practically, our work enhances 38-100% throughput with 26-65% speedups on consumer-grade GPUs by allowing for considerably larger batch sizes. Our work sends a message to the community with promising directions in both representation quality and efficiency for the question-answering task in natural language processing.

EEE-QA: Exploring Effective and Efficient Question-Answer Representations

TL;DR

This work challenges the existing question-answer encoding convention and explores finer representations and opportunities to simultaneously embed all answer candidates with the question to enable cross-reference between answer choices and improve inference throughput via reduced memory usage.

Abstract

Current approaches to question answering rely on pre-trained language models (PLMs) like RoBERTa. This work challenges the existing question-answer encoding convention and explores finer representations. We begin with testing various pooling methods compared to using the begin-of-sentence token as a question representation for better quality. Next, we explore opportunities to simultaneously embed all answer candidates with the question. This enables cross-reference between answer choices and improves inference throughput via reduced memory usage. Despite their simplicity and effectiveness, these methods have yet to be widely studied in current frameworks. We experiment with different PLMs, and with and without the integration of knowledge graphs. Results prove that the memory efficacy of the proposed techniques with little sacrifice in performance. Practically, our work enhances 38-100% throughput with 26-65% speedups on consumer-grade GPUs by allowing for considerably larger batch sizes. Our work sends a message to the community with promising directions in both representation quality and efficiency for the question-answering task in natural language processing.
Paper Structure (21 sections, 1 equation, 1 figure, 4 tables)

This paper contains 21 sections, 1 equation, 1 figure, 4 tables.

Figures (1)

  • Figure 1: A pilot experiment of appending random answers to the question before performing QA. Tested on CommonsenseQA (5 candidates) and OpenBookQA (4 candidates) with GreaseLM.