Can We Edit LLMs for Long-Tail Biomedical Knowledge?
Xinhao Yi, Jake Lever, Kevin Bryson, Zaiqiao Meng
TL;DR
This paper tackles updating LLMs for long-tail biomedical knowledge via knowledge editing, addressing the unique challenges posed by biomedical data distribution. It introduces CliKT, a long-tail biomedical benchmark built from SNOMED CT and PubMed, and uses knowledge probing to evaluate factual recall before and after editing. The study shows that while editing methods (ROME, MEMIT, MEND, IKE, FT) improve long-tail performance, results remain inferior to high-frequency knowledge due to a high prevalence of one-to-many relations; edited models memorize facts but struggle to generalise. The findings highlight the need for tailored editing strategies that specifically address one-to-many knowledge in the biomedical domain and point to limitations related to data granularity and cross-domain generalisability.
Abstract
Knowledge editing has emerged as an effective approach for updating large language models (LLMs) by modifying their internal knowledge. However, their application to the biomedical domain faces unique challenges due to the long-tailed distribution of biomedical knowledge, where rare and infrequent information is prevalent. In this paper, we conduct the first comprehensive study to investigate the effectiveness of knowledge editing methods for editing long-tail biomedical knowledge. Our results indicate that, while existing editing methods can enhance LLMs' performance on long-tail biomedical knowledge, their performance on long-tail knowledge remains inferior to that on high-frequency popular knowledge, even after editing. Our further analysis reveals that long-tail biomedical knowledge contains a significant amount of one-to-many knowledge, where one subject and relation link to multiple objects. This high prevalence of one-to-many knowledge limits the effectiveness of knowledge editing in improving LLMs' understanding of long-tail biomedical knowledge, highlighting the need for tailored strategies to bridge this performance gap.
