Table of Contents
Fetching ...

LLaSA: Large Language and Structured Data Assistant

Yao Xu, Shizhu He, Jiabei Chen, Zeng Xiangrong, Bingning Wang, Guang Liu, Jun Zhao, Kang Liu

TL;DR

LLaSA tackles structured knowledge grounding by unifying heterogeneous structured data into hypergraphs and pretraining a hypergraph encoder together with a G-Former to produce fixed-length representations that augment LLM inputs as soft prompts. The hypergraph encoder, HyperTrans, and the G-Former bridge modality gaps via cross-attention, enabling decoupled pretraining from any specific LLM and broad adaptability. Empirical results across 10 SKG tasks show significant gains over baselines, with especially strong performance when using LoRA tuning and competitive results under full fine-tuning, demonstrating improved generalization to held-out datasets. The approach offers a scalable path to enhancing LLMs with diverse structured data while maintaining efficiency and cross-model transferability.

Abstract

Structured data, such as tables, graphs, and databases, play a critical role in plentiful NLP tasks such as question answering and dialogue system. Recently, inspired by Vision-Language Models, Graph Neutral Networks (GNNs) have been introduced as an additional modality into the input of Large Language Models (LLMs) to improve their performance on Structured Knowledge Grounding (SKG) tasks. However, those GNN-enhanced LLMs have the following limitations: (1) They employ diverse GNNs to model varying types of structured data, rendering them unable to uniformly process various forms of structured data. (2) The pretraining of GNNs is coupled with specific LLMs, which prevents GNNs from fully aligning with the textual space and limits their adaptability to other LLMs. To address these issues, we propose \textbf{L}arge \textbf{L}anguage and \textbf{S}tructured Data \textbf{A}ssistant (LLaSA), a general framework for enhancing LLMs' ability to handle structured data. Specifically, we represent various types of structured data in a unified hypergraph format, and use self-supervised learning to pretrain a hypergraph encoder, and a G-Former compressing encoded hypergraph representations with cross-attention. The compressed hypergraph representations are appended to the serialized inputs during training and inference stages of LLMs. Experimental results on multiple SKG tasks show that our pretrained hypergraph encoder can adapt to various LLMs and enhance their ability to process different types of structured data. Besides, LLaSA, with LoRA fine-tuning, outperforms previous SOTA method using full parameters tuning.

LLaSA: Large Language and Structured Data Assistant

TL;DR

LLaSA tackles structured knowledge grounding by unifying heterogeneous structured data into hypergraphs and pretraining a hypergraph encoder together with a G-Former to produce fixed-length representations that augment LLM inputs as soft prompts. The hypergraph encoder, HyperTrans, and the G-Former bridge modality gaps via cross-attention, enabling decoupled pretraining from any specific LLM and broad adaptability. Empirical results across 10 SKG tasks show significant gains over baselines, with especially strong performance when using LoRA tuning and competitive results under full fine-tuning, demonstrating improved generalization to held-out datasets. The approach offers a scalable path to enhancing LLMs with diverse structured data while maintaining efficiency and cross-model transferability.

Abstract

Structured data, such as tables, graphs, and databases, play a critical role in plentiful NLP tasks such as question answering and dialogue system. Recently, inspired by Vision-Language Models, Graph Neutral Networks (GNNs) have been introduced as an additional modality into the input of Large Language Models (LLMs) to improve their performance on Structured Knowledge Grounding (SKG) tasks. However, those GNN-enhanced LLMs have the following limitations: (1) They employ diverse GNNs to model varying types of structured data, rendering them unable to uniformly process various forms of structured data. (2) The pretraining of GNNs is coupled with specific LLMs, which prevents GNNs from fully aligning with the textual space and limits their adaptability to other LLMs. To address these issues, we propose \textbf{L}arge \textbf{L}anguage and \textbf{S}tructured Data \textbf{A}ssistant (LLaSA), a general framework for enhancing LLMs' ability to handle structured data. Specifically, we represent various types of structured data in a unified hypergraph format, and use self-supervised learning to pretrain a hypergraph encoder, and a G-Former compressing encoded hypergraph representations with cross-attention. The compressed hypergraph representations are appended to the serialized inputs during training and inference stages of LLMs. Experimental results on multiple SKG tasks show that our pretrained hypergraph encoder can adapt to various LLMs and enhance their ability to process different types of structured data. Besides, LLaSA, with LoRA fine-tuning, outperforms previous SOTA method using full parameters tuning.

Paper Structure

This paper contains 23 sections, 7 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of LLaSA, which can handle various types of structured data by transforming them into a unified format and encoding them with a universal encoder. The serialized structured data and the graph representations are then used as input to the LLM.
  • Figure 2: Comparison between LLM-based and the proposed G-Former-based GNN pretraining strategies.
  • Figure 3: Examples of converting structured data to a unified hypergraph format, where yellow nodes represent hyperedges. In Figure (a), the arrows are omitted as the edges in the hypergraph are bidirectional.
  • Figure 4: (a) We employ two pretraining objectives to train the hypergraph encoder and G-Former, with the upper left corner showing the attention masks used for different pretraining tasks. (b) In the LLM finetuning stage, we only use the graph transformer to extract a fixed number of representations of hypergraph, and treat them as soft prompts in LLM's input.
  • Figure 5: The average performance of models using different pretrained Hypergraph Encoder (GNN).