Table of Contents
Fetching ...

A Foundation Model for Zero-shot Logical Query Reasoning

Mikhail Galkin, Jincheng Zhou, Bruno Ribeiro, Jian Tang, Zhaocheng Zhu

TL;DR

UltraQuery is presented, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG and can solve CLQA on any KG after finetuning on a single dataset.

Abstract

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional queries comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG after finetuning on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 15 of them.

A Foundation Model for Zero-shot Logical Query Reasoning

TL;DR

UltraQuery is presented, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG and can solve CLQA on any KG after finetuning on a single dataset.

Abstract

Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional queries comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, the first foundation model for inductive reasoning that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG after finetuning on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 15 of them.
Paper Structure (17 sections, 1 equation, 7 figures, 7 tables)

This paper contains 17 sections, 1 equation, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Zero-shot query answering performance (MRR, higher is better) of a single UltraQuery model trained on one FB15k237 queries dataset compared to the best available baselines and ablated UltraQuery LP on 23 datasets. EPFO is the average of 9 query types with $(\wedge, \lor)$ operators, Negation is the average of 5 query types with the negation operator $(\neg)$. On average, a single UltraQuery model outperforms the best baselines trained specifically on each dataset. More results are presented in Table \ref{['tab:maintab1']} and Appendix \ref{['app:more_results']}.
  • Figure 2: The inductive logical query answering setup where training and inference graphs (and queries) have different entity and relation vocabularies. We propose a single model (UltraQuery) that zero-shot generalizes to query answering on any graph with new entity or relation vocabulary at inference time.
  • Figure 3: (a) Example of ip query answering with UltraQuery: the inductive parametric projection operator (\ref{['subsec:ultra_proj']}) executes relation projections on any graph and returns a scalar score for each entity; the scores are aggregated by non-parametric logical operators (\ref{['subsec:logic_ops']}) implemented with fuzzy logics. Intermediate scores are used for weighted initializion of relation projections on the next hop. (b) The multi-source propagation issue with a pre-trained link predictor for relation projection: pre-training on 1p link prediction is done in the single-source labeling mode (top) where only one query node is labeled with a non-zero vector; complex queries at later intermediate hops might have several plausible sources with non-zero initial weights (bottom) where a pre-trained operator fails.
  • Figure 4: Mitigation of the multi-source message passing issue (Section \ref{['sec:method']}) with UltraQuery: while UltraQuery LP (pre-trained only on 1p link prediction) does reach higher 1p query performance (center right), it underperforms on negation queries (center left). UltraQuery adapts to the multi-source message passing scheme and trades a fraction of 1p query performance for better averaged EPFO, e.g., on the 3i query (right), and negation queries performance. More results are in Appendix \ref{['app:more_results']}.
  • Figure 5: Qualitative analysis on 9 inductive $(e)$ and 3 transductive datasets averaged across all 14 query types. Faithfullness, MRR (left):UltraQuery successfully finds easy answers in larger inference graphs and outperforms trained GNN-QE baselines. Ranking of easy vs hard answers, ROC AUC (center): zero-shot inference methods slightly lag behind trainable GNN-QE due to assigning higher scores to hard answers. Cardinality Prediction, MAPE (right):UltraQuery is comparable to a much larger trainable baseline QTO. In all cases, UltraQuery LP is significantly inferior to the main model.
  • ...and 2 more figures