URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression

Zhuoqun Li; Hongyu Lin; Tianshu Wang; Boxi Cao; Yaojie Lu; Weixiang Zhou; Hao Wang; Zhenyu Zeng; Le Sun; Xianpei Han

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression

Zhuoqun Li, Hongyu Lin, Tianshu Wang, Boxi Cao, Yaojie Lu, Weixiang Zhou, Hao Wang, Zhenyu Zeng, Le Sun, Xianpei Han

TL;DR

The paper tackles universal referential knowledge linking (RKL) by proposing URL, a framework that uses LLM-driven task-instructed representation compression to produce task-aware embeddings for claims and references. It combines generative reconstruction and contrastive learning in a multi-view objective, with the total loss defined as $ ext{L}_{total} = abla ext{L} = ext{alpha} ext{L}_1 + (1- ext{alpha}) ext{L}_2$, ensuring both knowledge retention and discriminative alignment. Training data are generated by transforming QA corpora into diverse claim-reference pairs across multiple domains, enabling a unified learning signal for versatile RKL tasks. URLBench, a four-domain benchmark spanning finance, law, medicine, and education, demonstrates that URL outperforms strong embedding baselines and API-based methods, validating the universality and effectiveness of the approach while highlighting practical considerations and ethical safeguards.

Abstract

Linking a claim to grounded references is a critical ability to fulfill human demands for authentic and reliable information. Current studies are limited to specific tasks like information retrieval or semantic matching, where the claim-reference relationships are unique and fixed, while the referential knowledge linking (RKL) in real-world can be much more diverse and complex. In this paper, we propose universal referential knowledge linking (URL), which aims to resolve diversified referential knowledge linking tasks by one unified model. To this end, we propose a LLM-driven task-instructed representation compression, as well as a multi-view learning approach, in order to effectively adapt the instruction following and semantic understanding abilities of LLMs to referential knowledge linking. Furthermore, we also construct a new benchmark to evaluate ability of models on referential knowledge linking tasks across different scenarios. Experiments demonstrate that universal RKL is challenging for existing approaches, while the proposed framework can effectively resolve the task across various scenarios, and therefore outperforms previous approaches by a large margin.

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression

TL;DR

, ensuring both knowledge retention and discriminative alignment. Training data are generated by transforming QA corpora into diverse claim-reference pairs across multiple domains, enabling a unified learning signal for versatile RKL tasks. URLBench, a four-domain benchmark spanning finance, law, medicine, and education, demonstrates that URL outperforms strong embedding baselines and API-based methods, validating the universality and effectiveness of the approach while highlighting practical considerations and ethical safeguards.

Abstract

Paper Structure (26 sections, 4 equations, 2 figures, 6 tables)

This paper contains 26 sections, 4 equations, 2 figures, 6 tables.

Introduction
Related Work
Semantic Matching
Information Retrieval
Sentence Embedding
Universal RKL via Task-instructed Representation Compression
Task-instructed Compression for RKL
Multi-view URL Learning
Generative Reconstruction.
Contrastive Learning.
Multi-view Learning.
Constructing URL Training Data via QA Corpus Transformation
URLBench: Benchmarking Universal Referential Knowledge Linking
Experiments
Experimental Settings
...and 11 more sections

Figures (2)

Figure 1: Compared to conventional approaches focus on a single task, URL aims to universally address RKL tasks on versatile semantics with deep knowledge.
Figure 2: Illustration of training corpus construction and multi-view URL learning. Based on QA data, we set the question as claim and answer as reference, then annotate instructions that describe the field of data and the purpose of representation. For learning, contrastive learning is on embeddings of claims and references, generative reconstruction is to force the model generating positive reference based on claim embedding and vice versa.

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression

TL;DR

Abstract

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression

Authors

TL;DR

Abstract

Table of Contents

Figures (2)