LEKA:LLM-Enhanced Knowledge Augmentation
Xinhao Zhang, Jinghan Zhang, Fengran Mo, Dongjie Wang, Yanjie Fu, Kunpeng Liu
TL;DR
LEKA tackles the challenge of automating knowledge sourcing for transfer learning in data-sparse domains by integrating LLM-driven target information extraction, dataset-RAG retrieval, and robust data harmonization. It leverages a kernel-based feature-space alignment $d(f(\\mathcal{X}_k), \\mathcal{X}_T)$ and a Wasserstein-based distribution alignment $W(\\mathcal{P}_k,\\mathcal{P}_T)$ to reconstruct a source domain $\\mathcal{D}_S^*$ that optimally supports target performance. The transfer objective combines source- and target-domain losses via $\\theta^{*}=\\arg\\min_{\\theta}(\\alpha \\mathbb{E}_{(x,y) \\\in\\mathcal{D}_k'}[\\mathcal{L}(f_{T_j}(x;\\theta), y)] + (1-\\alpha) \\mathbb{E}_{(x,y) \\\in\\mathcal{D}_T}[\\mathcal{L}(f_{T_j}(x;\\theta), y)])$, with backpropagation guiding parameter updates. Experiments across four medical/economic datasets show LEKA achieving higher accuracy, precision, recall, and F1 than baselines and traditional transfer methods, demonstrating improved transfer efficiency and reduced domain shift. Overall, LEKA provides a scalable, automated pathway for knowledge augmentation that is particularly impactful in data-limited settings.
Abstract
Humans excel in analogical learning and knowledge transfer and, more importantly, possess a unique understanding of identifying appropriate sources of knowledge. From a model's perspective, this presents an interesting challenge. If models could autonomously retrieve knowledge useful for transfer or decision-making to solve problems, they would transition from passively acquiring to actively accessing and learning from knowledge. However, filling models with knowledge is relatively straightforward -- it simply requires more training and accessible knowledge bases. The more complex task is teaching models about which knowledge can be analogized and transferred. Therefore, we design a knowledge augmentation method, LEKA, for knowledge transfer that actively searches for suitable knowledge sources that can enrich the target domain's knowledge. This LEKA method extracts key information from the target domain's textual information, retrieves pertinent data from external data libraries, and harmonizes retrieved data with the target domain data in feature space and marginal probability measures. We validate the effectiveness of our approach through extensive experiments across various domains and demonstrate significant improvements over traditional methods in reducing computational costs, automating data alignment, and optimizing transfer learning outcomes.
