Table of Contents
Fetching ...

HeGTa: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

Rihui Jin, Yu Li, Guilin Qi, Nan Hu, Yuan-Fang Li, Jiaoyan Chen, Jianan Wang, Yongrui Chen, Dehai Min, Sheng Bi

TL;DR

HeGTa tackles the problem of few-shot complex table understanding by fusing a tabular heterogeneous graph encoder with a large language model. It constructs a Tabular HG to preserve topological semantics, aligns the HG encoder with the LLM through soft prompts, and pre-trains with three multi-granularity self-supervised tasks (TRC, TCM, TCG) before task-specific fine-tuning. Empirical results across nine TU datasets for CTC, TTC, and TQA show HeGTa achieving state-of-the-art performance in few-shot settings, with ablations confirming the value of each component and the benefits of heterogeneous graph representations over homogeneous or linearized approaches. The approach demonstrates strong generalization and practical potential for real-world TU tasks, particularly those with complex table structures and limited annotations.

Abstract

Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's parametric knowledge through soft prompts and instruction turning and deals with complex tables by a multi-task pre-training scheme involving three novel multi-granularity self-supervised HG pre-training objectives.We empirically demonstrate the effectiveness of HGT, showing that it outperforms the SOTA for few-shot complex TU on several benchmarks.

HeGTa: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding

TL;DR

HeGTa tackles the problem of few-shot complex table understanding by fusing a tabular heterogeneous graph encoder with a large language model. It constructs a Tabular HG to preserve topological semantics, aligns the HG encoder with the LLM through soft prompts, and pre-trains with three multi-granularity self-supervised tasks (TRC, TCM, TCG) before task-specific fine-tuning. Empirical results across nine TU datasets for CTC, TTC, and TQA show HeGTa achieving state-of-the-art performance in few-shot settings, with ablations confirming the value of each component and the benefits of heterogeneous graph representations over homogeneous or linearized approaches. The approach demonstrates strong generalization and practical potential for real-world TU tasks, particularly those with complex table structures and limited annotations.

Abstract

Table understanding (TU) has achieved promising advancements, but it faces the challenges of the scarcity of manually labeled tables and the presence of complex table structures.To address these challenges, we propose HGT, a framework with a heterogeneous graph (HG)-enhanced large language model (LLM) to tackle few-shot TU tasks.It leverages the LLM by aligning the table semantics with the LLM's parametric knowledge through soft prompts and instruction turning and deals with complex tables by a multi-task pre-training scheme involving three novel multi-granularity self-supervised HG pre-training objectives.We empirically demonstrate the effectiveness of HGT, showing that it outperforms the SOTA for few-shot complex TU on several benchmarks.
Paper Structure (25 sections, 4 figures, 4 tables)

This paper contains 25 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Few-shot complex Table Understanding. Complex tables contain intricate cell-to-cell relationships, including dependency, hierarchical, and parallel ones.
  • Figure 2: An overview of HeGTa framework. HeGTa processes < table, instruction> as an input. First, the table is converted into an HG and processed by a Tabular HG encoder to generate a vector for each tabular node, while the LLM transforms instruction texts into initial token embeddings. Subsequently, the HG encoder's outputs serve as soft prompts for the LLM, enabling the replacement of placeholder embeddings with actual tabular node vectors. The modified embedding sequence is then processed by the remaining LLM layers. Throughout Stage 1 and Stage 2, only the weights of red components are tuned.
  • Figure 3: Table-to-heterogeneous graph conversion. Node types are color-coded: Table (green), Row (red), Data Cell (blue), and Header Cell (yellow). Edge types are similarly color-coded, with bidirectional edges shown as undirected lines. Some edges are omitted for clarity.
  • Figure 4: Examples of three self-supervised instruction tuning datasets, each tailored for distinct tasks: Table Row Classification, Table Cell Matching, and Table Context Generation.