A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions
Dingzirui Wang, Longxu Dou, Wanxiang Che
TL;DR
This survey addresses table-and-text HybridQA by cataloging benchmarks, methods, and challenges. It frames HybridQA as retrieval-augmented reasoning over heterogeneous evidence, organized into retriever-reader pipelines with taxonomy across retrieval granularity, encoder/decoder improvements, and data manipulation. Key contributions include a comprehensive benchmark taxonomy, a structured method taxonomy, and four forward-looking directions: richer relation modeling, domain knowledge integration, data augmentation, and richer context modeling for realistic settings. The work aims to guide robust, scalable HybridQA systems applicable to finance, science, and beyond.
Abstract
Table-and-text hybrid question answering (HybridQA) is a widely used and challenging NLP task commonly applied in the financial and scientific domain. The early research focuses on migrating other QA task methods to HybridQA, while with further research, more and more HybridQA-specific methods have been present. With the rapid development of HybridQA, the systematic survey is still under-explored to summarize the main techniques and advance further research. So we present this work to summarize the current HybridQA benchmarks and methods, then analyze the challenges and future directions of this task. The contributions of this paper can be summarized in three folds: (1) first survey, to our best knowledge, including benchmarks, methods and challenges for HybridQA; (2) systematic investigation with the reasonable comparison of the existing systems to articulate their advantages and shortcomings; (3) detailed analysis of challenges in four important dimensions to shed light on future directions.
