Table of Contents
Fetching ...

Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective

Da Li, Keping Bi, Jiafeng Guo, Xueqi Cheng

TL;DR

THYME tackles table retrieval by recognizing per-field matching preferences and integrating dense and sparse representations through a shared encoder. It introduces table serialization with field markers, a table-specific pooling strategy, and a Mixture of Field Experts to achieve field-aware lexical matching, all trained with a hybrid objective that combines $s_{sem}$ and $s_{lex}$ into $s(q,t)=s_{sem}(q,t)+s_{lex}(q,t)$. On NQ-TABLES and OTT-QA, THYME outperforms state-of-the-art baselines across sparse, dense, and hybrid categories, with analyses showing that titles favor semantic matching while headers/cells rely on lexical matching, validating the design. In TableQA, THYME enhances retrieval-augmented generation across multiple LLMs, illustrating practical impact for real-world table-based QA. Limitations include the absence of LLM-based retriever experiments and multimodal table data, pointing to future work to broaden applicability.

Abstract

Table retrieval, essential for accessing information through tabular data, is less explored compared to text retrieval. The row/column structure and distinct fields of tables (including titles, headers, and cells) present unique challenges. For example, different table fields have varying matching preferences: cells may favor finer-grained (word/phrase level) matching over broader (sentence/passage level) matching due to their fragmented and detailed nature, unlike titles. This necessitates a table-specific retriever to accommodate the various matching needs of each table field. Therefore, we introduce a Table-tailored HYbrid Matching rEtriever (THYME), which approaches table retrieval from a field-aware hybrid matching perspective. Empirical results on two table retrieval benchmarks, NQ-TABLES and OTT-QA, show that THYME significantly outperforms state-of-the-art baselines. Comprehensive analyses confirm the differing matching preferences across table fields and validate the design of THYME.

Tailoring Table Retrieval from a Field-aware Hybrid Matching Perspective

TL;DR

THYME tackles table retrieval by recognizing per-field matching preferences and integrating dense and sparse representations through a shared encoder. It introduces table serialization with field markers, a table-specific pooling strategy, and a Mixture of Field Experts to achieve field-aware lexical matching, all trained with a hybrid objective that combines and into . On NQ-TABLES and OTT-QA, THYME outperforms state-of-the-art baselines across sparse, dense, and hybrid categories, with analyses showing that titles favor semantic matching while headers/cells rely on lexical matching, validating the design. In TableQA, THYME enhances retrieval-augmented generation across multiple LLMs, illustrating practical impact for real-world table-based QA. Limitations include the absence of LLM-based retriever experiments and multimodal table data, pointing to future work to broaden applicability.

Abstract

Table retrieval, essential for accessing information through tabular data, is less explored compared to text retrieval. The row/column structure and distinct fields of tables (including titles, headers, and cells) present unique challenges. For example, different table fields have varying matching preferences: cells may favor finer-grained (word/phrase level) matching over broader (sentence/passage level) matching due to their fragmented and detailed nature, unlike titles. This necessitates a table-specific retriever to accommodate the various matching needs of each table field. Therefore, we introduce a Table-tailored HYbrid Matching rEtriever (THYME), which approaches table retrieval from a field-aware hybrid matching perspective. Empirical results on two table retrieval benchmarks, NQ-TABLES and OTT-QA, show that THYME significantly outperforms state-of-the-art baselines. Comprehensive analyses confirm the differing matching preferences across table fields and validate the design of THYME.

Paper Structure

This paper contains 25 sections, 14 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: A case of table retrieval showing fine-grained matching in cells is important.
  • Figure 2: Illustration of Table-Tailored Hybrid Matching. Serialized tables are encoded through an encoder shared with queries.
  • Figure 3: Top-1 retrieved table from different retrievers.