TWEAC: Transformer with Extendable QA Agent Classifiers
Gregor Geigle, Nils Reimers, Andreas Rücklé, Iryna Gurevych
TL;DR
The paper tackles the challenge of broad-question answering by routing queries to a curated suite of specialized QA agents in a meta-QA framework. It compares similarity-based retrieval and a transformer-based Extendable Agent Classifier (TWEAC) for selecting suitable agents, demonstrating strong performance and sample efficiency even with hundreds of agents. TWEAC benefits from per-agent classification heads that allow scalable extension, and the authors propose half-and-half sampling to enable rapid integration of new agents without full retraining. The study provides two evaluation setups (QA-Tasks and Many-Agents), analyzes scalability and extension strategies, and offers insights into error patterns due to overlapping agent domains, with code and data available for replication.
Abstract
Question answering systems should help users to access knowledge on a broad range of topics and to answer a wide array of different questions. Most systems fall short of this expectation as they are only specialized in one particular setting, e.g., answering factual questions with Wikipedia data. To overcome this limitation, we propose composing multiple QA agents within a meta-QA system. We argue that there exist a wide range of specialized QA agents in literature. Thus, we address the central research question of how to effectively and efficiently identify suitable QA agents for any given question. We study both supervised and unsupervised approaches to address this challenge, showing that TWEAC -- Transformer with Extendable Agent Classifiers -- achieves the best performance overall with 94% accuracy. We provide extensive insights on the scalability of TWEAC, demonstrating that it scales robustly to over 100 QA agents with each providing just 1000 examples of questions they can answer. Our code and data is available: https://github.com/UKPLab/TWEAC-qa-agent-selection
