Table of Contents
Fetching ...

LEKA:LLM-Enhanced Knowledge Augmentation

Xinhao Zhang, Jinghan Zhang, Fengran Mo, Dongjie Wang, Yanjie Fu, Kunpeng Liu

TL;DR

LEKA tackles the challenge of automating knowledge sourcing for transfer learning in data-sparse domains by integrating LLM-driven target information extraction, dataset-RAG retrieval, and robust data harmonization. It leverages a kernel-based feature-space alignment $d(f(\\mathcal{X}_k), \\mathcal{X}_T)$ and a Wasserstein-based distribution alignment $W(\\mathcal{P}_k,\\mathcal{P}_T)$ to reconstruct a source domain $\\mathcal{D}_S^*$ that optimally supports target performance. The transfer objective combines source- and target-domain losses via $\\theta^{*}=\\arg\\min_{\\theta}(\\alpha \\mathbb{E}_{(x,y) \\\in\\mathcal{D}_k'}[\\mathcal{L}(f_{T_j}(x;\\theta), y)] + (1-\\alpha) \\mathbb{E}_{(x,y) \\\in\\mathcal{D}_T}[\\mathcal{L}(f_{T_j}(x;\\theta), y)])$, with backpropagation guiding parameter updates. Experiments across four medical/economic datasets show LEKA achieving higher accuracy, precision, recall, and F1 than baselines and traditional transfer methods, demonstrating improved transfer efficiency and reduced domain shift. Overall, LEKA provides a scalable, automated pathway for knowledge augmentation that is particularly impactful in data-limited settings.

Abstract

Humans excel in analogical learning and knowledge transfer and, more importantly, possess a unique understanding of identifying appropriate sources of knowledge. From a model's perspective, this presents an interesting challenge. If models could autonomously retrieve knowledge useful for transfer or decision-making to solve problems, they would transition from passively acquiring to actively accessing and learning from knowledge. However, filling models with knowledge is relatively straightforward -- it simply requires more training and accessible knowledge bases. The more complex task is teaching models about which knowledge can be analogized and transferred. Therefore, we design a knowledge augmentation method, LEKA, for knowledge transfer that actively searches for suitable knowledge sources that can enrich the target domain's knowledge. This LEKA method extracts key information from the target domain's textual information, retrieves pertinent data from external data libraries, and harmonizes retrieved data with the target domain data in feature space and marginal probability measures. We validate the effectiveness of our approach through extensive experiments across various domains and demonstrate significant improvements over traditional methods in reducing computational costs, automating data alignment, and optimizing transfer learning outcomes.

LEKA:LLM-Enhanced Knowledge Augmentation

TL;DR

LEKA tackles the challenge of automating knowledge sourcing for transfer learning in data-sparse domains by integrating LLM-driven target information extraction, dataset-RAG retrieval, and robust data harmonization. It leverages a kernel-based feature-space alignment and a Wasserstein-based distribution alignment to reconstruct a source domain that optimally supports target performance. The transfer objective combines source- and target-domain losses via , with backpropagation guiding parameter updates. Experiments across four medical/economic datasets show LEKA achieving higher accuracy, precision, recall, and F1 than baselines and traditional transfer methods, demonstrating improved transfer efficiency and reduced domain shift. Overall, LEKA provides a scalable, automated pathway for knowledge augmentation that is particularly impactful in data-limited settings.

Abstract

Humans excel in analogical learning and knowledge transfer and, more importantly, possess a unique understanding of identifying appropriate sources of knowledge. From a model's perspective, this presents an interesting challenge. If models could autonomously retrieve knowledge useful for transfer or decision-making to solve problems, they would transition from passively acquiring to actively accessing and learning from knowledge. However, filling models with knowledge is relatively straightforward -- it simply requires more training and accessible knowledge bases. The more complex task is teaching models about which knowledge can be analogized and transferred. Therefore, we design a knowledge augmentation method, LEKA, for knowledge transfer that actively searches for suitable knowledge sources that can enrich the target domain's knowledge. This LEKA method extracts key information from the target domain's textual information, retrieves pertinent data from external data libraries, and harmonizes retrieved data with the target domain data in feature space and marginal probability measures. We validate the effectiveness of our approach through extensive experiments across various domains and demonstrate significant improvements over traditional methods in reducing computational costs, automating data alignment, and optimizing transfer learning outcomes.

Paper Structure

This paper contains 30 sections, 11 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Example of LEKA. We adopt an LLM to retrieve proper source domain data to transfer knowledge to a data-limited target domain. The LLM extracts the key information of the target data to retrieve a relevant dataset; then, we adopt the LLM for harmonization.
  • Figure 2: Framework of LEKA includes: 1) an LLM extracts and embeds the textual information of the target dataset, then 2) retrieves datasets in libraries, and 3) processes data harmonization. With harmonized datasets, we can transfer knowledge from the source dataset we construct to enhance learning on the target dataset.
  • Figure 3: Comparison of accuracy and F1 scores on various transfer learning methods.