Table of Contents
Fetching ...

A Hybrid Search for Complex Table Question Answering in Securities Report

Daiki Shirafuji, Koji Tanaka, Tatsuhiko Saito

TL;DR

This work tackles the challenge of extracting precise values from complex securities-table data when table headers are multi-level or merged. It introduces a header-agnostic cell-extraction pipeline that cleans tables, uses a hybrid TF-IDF and LM-based retrieval to score cells, and identifies the correct row/column intersection as the answer, with domain-specific contrastive training for the encoder. The approach yields state-of-the-art performance on the NTCIR-18 U4 TQA dataset, achieving 74.6% value extraction accuracy and 78.8% cell-ID accuracy, surpassing GPT-4o mini baselines, though still below human non-expert performance. The work demonstrates how combining lexical and semantic signals, along with targeted pretraining, can robustly handle structurally complex financial tables and informs future integration with more efficient search models for practical QA in finance documents.

Abstract

Recently, Large Language Models (LLMs) are gaining increased attention in the domain of Table Question Answering (TQA), particularly for extracting information from tables in documents. However, directly entering entire tables as long text into LLMs often leads to incorrect answers because most LLMs cannot inherently capture complex table structures. In this paper, we propose a cell extraction method for TQA without manual identification, even for complex table headers. Our approach estimates table headers by computing similarities between a given question and individual cells via a hybrid retrieval mechanism that integrates a language model and TF-IDF. We then select as the answer the cells at the intersection of the most relevant row and column. Furthermore, the language model is trained using contrastive learning on a small dataset of question-header pairs to enhance performance. We evaluated our approach in the TQA dataset from the U4 shared task at NTCIR-18. The experimental results show that our pipeline achieves an accuracy of 74.6\%, outperforming existing LLMs such as GPT-4o mini~(63.9\%). In the future, although we used traditional encoder models for retrieval in this study, we plan to incorporate more efficient text-search models to improve performance and narrow the gap with human evaluation results.

A Hybrid Search for Complex Table Question Answering in Securities Report

TL;DR

This work tackles the challenge of extracting precise values from complex securities-table data when table headers are multi-level or merged. It introduces a header-agnostic cell-extraction pipeline that cleans tables, uses a hybrid TF-IDF and LM-based retrieval to score cells, and identifies the correct row/column intersection as the answer, with domain-specific contrastive training for the encoder. The approach yields state-of-the-art performance on the NTCIR-18 U4 TQA dataset, achieving 74.6% value extraction accuracy and 78.8% cell-ID accuracy, surpassing GPT-4o mini baselines, though still below human non-expert performance. The work demonstrates how combining lexical and semantic signals, along with targeted pretraining, can robustly handle structurally complex financial tables and informs future integration with more efficient search models for practical QA in finance documents.

Abstract

Recently, Large Language Models (LLMs) are gaining increased attention in the domain of Table Question Answering (TQA), particularly for extracting information from tables in documents. However, directly entering entire tables as long text into LLMs often leads to incorrect answers because most LLMs cannot inherently capture complex table structures. In this paper, we propose a cell extraction method for TQA without manual identification, even for complex table headers. Our approach estimates table headers by computing similarities between a given question and individual cells via a hybrid retrieval mechanism that integrates a language model and TF-IDF. We then select as the answer the cells at the intersection of the most relevant row and column. Furthermore, the language model is trained using contrastive learning on a small dataset of question-header pairs to enhance performance. We evaluated our approach in the TQA dataset from the U4 shared task at NTCIR-18. The experimental results show that our pipeline achieves an accuracy of 74.6\%, outperforming existing LLMs such as GPT-4o mini~(63.9\%). In the future, although we used traditional encoder models for retrieval in this study, we plan to incorporate more efficient text-search models to improve performance and narrow the gap with human evaluation results.

Paper Structure

This paper contains 22 sections, 1 equation, 3 figures, 1 table.

Figures (3)

  • Figure 1: Overview of the proposed pipeline with examples of tabular data from the Japanese financial statements.
  • Figure 2: Overview of our method for constructing a training dataset for training Language Model based on the TQA dataset.
  • Figure 3: An example of our GUI for the human evaluation process.