Joint Embeddings for Graph Instruction Tuning

Aaron Haag; Vlad Argatu; Oliver Lohse

Joint Embeddings for Graph Instruction Tuning

Aaron Haag, Vlad Argatu, Oliver Lohse

TL;DR

This work explores the integration of the graph modality in LLM for general graph instruction following tasks and aims at producing a deep learning model that enhances an underlying LLM with graph embeddings and trains it to understand them and to produce an answer grounded in the graph representation.

Abstract

Large Language Models (LLMs) have achieved impressive performance in text understanding and have become an essential tool for building smart assistants. Originally focusing on text, they have been enhanced with multimodal capabilities in recent works that successfully built visual instruction following assistants. As far as the graph modality goes, however, no such assistants have yet been developed. Graph structures are complex in that they represent relation between different features and are permutation invariant. Moreover, representing them in purely textual form does not always lead to good LLM performance even for finetuned models. As a result, there is a need to develop a new method to integrate graphs in LLMs for general graph understanding. This work explores the integration of the graph modality in LLM for general graph instruction following tasks. It aims at producing a deep learning model that enhances an underlying LLM with graph embeddings and trains it to understand them and to produce, given an instruction, an answer grounded in the graph representation. The approach performs significantly better than a graph to text approach and remains consistent even for larger graphs.

Joint Embeddings for Graph Instruction Tuning

TL;DR

Abstract

Paper Structure (18 sections, 6 equations, 9 figures)

This paper contains 18 sections, 6 equations, 9 figures.

Introduction
Related Work
Integration of Vision in Language Models
Integration of Graphs in Language Models
Method
GraphLlava architecture
Extracting Graph Embeddings
Training procedure
Numerical Results
Used Dataset and implementation
Experimental Setup and Implementation
Quantitative Results
Qualitative Results
Limitations
Architectural limitations
...and 3 more sections

Figures (9)

Figure 1: GraphLlava architecture. The Graph encoder part is adapted from graph-llm. The used Graph Transformer is introduced in Ma2023GraphIB and is known as Graph Inductive bias Transformer (GRIT).
Figure 2: The list of instructions for brief graph description. This list is the same as the one proposed for image instruction tuning in llava but adapted to ask for graph description.
Figure 3: Training input format. Loss is computed only on the colored tokens.
Figure 4: Prompt format for GPT-4-as-judge prompting. $X_q$, $X_a$, $Y_{gllava}$ and $Y_{tllama}$ being respectively the input query containing the textual description of the graph, the ground truth answer, the answer given by the GraphLlava model and the answer given by the TinyLlama model.
Figure 5: Repartition of GPT-4 choices in function of the graph's size. The different questions of the test dataset are batched according to the graph's textual description size. GPT-4-as-judge metric is reported for each batch.
...and 4 more figures

Joint Embeddings for Graph Instruction Tuning

TL;DR

Abstract

Joint Embeddings for Graph Instruction Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)