Table of Contents
Fetching ...

Can We Edit LLMs for Long-Tail Biomedical Knowledge?

Xinhao Yi, Jake Lever, Kevin Bryson, Zaiqiao Meng

TL;DR

This paper tackles updating LLMs for long-tail biomedical knowledge via knowledge editing, addressing the unique challenges posed by biomedical data distribution. It introduces CliKT, a long-tail biomedical benchmark built from SNOMED CT and PubMed, and uses knowledge probing to evaluate factual recall before and after editing. The study shows that while editing methods (ROME, MEMIT, MEND, IKE, FT) improve long-tail performance, results remain inferior to high-frequency knowledge due to a high prevalence of one-to-many relations; edited models memorize facts but struggle to generalise. The findings highlight the need for tailored editing strategies that specifically address one-to-many knowledge in the biomedical domain and point to limitations related to data granularity and cross-domain generalisability.

Abstract

Knowledge editing has emerged as an effective approach for updating large language models (LLMs) by modifying their internal knowledge. However, their application to the biomedical domain faces unique challenges due to the long-tailed distribution of biomedical knowledge, where rare and infrequent information is prevalent. In this paper, we conduct the first comprehensive study to investigate the effectiveness of knowledge editing methods for editing long-tail biomedical knowledge. Our results indicate that, while existing editing methods can enhance LLMs' performance on long-tail biomedical knowledge, their performance on long-tail knowledge remains inferior to that on high-frequency popular knowledge, even after editing. Our further analysis reveals that long-tail biomedical knowledge contains a significant amount of one-to-many knowledge, where one subject and relation link to multiple objects. This high prevalence of one-to-many knowledge limits the effectiveness of knowledge editing in improving LLMs' understanding of long-tail biomedical knowledge, highlighting the need for tailored strategies to bridge this performance gap.

Can We Edit LLMs for Long-Tail Biomedical Knowledge?

TL;DR

This paper tackles updating LLMs for long-tail biomedical knowledge via knowledge editing, addressing the unique challenges posed by biomedical data distribution. It introduces CliKT, a long-tail biomedical benchmark built from SNOMED CT and PubMed, and uses knowledge probing to evaluate factual recall before and after editing. The study shows that while editing methods (ROME, MEMIT, MEND, IKE, FT) improve long-tail performance, results remain inferior to high-frequency knowledge due to a high prevalence of one-to-many relations; edited models memorize facts but struggle to generalise. The findings highlight the need for tailored editing strategies that specifically address one-to-many knowledge in the biomedical domain and point to limitations related to data granularity and cross-domain generalisability.

Abstract

Knowledge editing has emerged as an effective approach for updating large language models (LLMs) by modifying their internal knowledge. However, their application to the biomedical domain faces unique challenges due to the long-tailed distribution of biomedical knowledge, where rare and infrequent information is prevalent. In this paper, we conduct the first comprehensive study to investigate the effectiveness of knowledge editing methods for editing long-tail biomedical knowledge. Our results indicate that, while existing editing methods can enhance LLMs' performance on long-tail biomedical knowledge, their performance on long-tail knowledge remains inferior to that on high-frequency popular knowledge, even after editing. Our further analysis reveals that long-tail biomedical knowledge contains a significant amount of one-to-many knowledge, where one subject and relation link to multiple objects. This high prevalence of one-to-many knowledge limits the effectiveness of knowledge editing in improving LLMs' understanding of long-tail biomedical knowledge, highlighting the need for tailored strategies to bridge this performance gap.

Paper Structure

This paper contains 26 sections, 6 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: LLMs often struggle with long-tail biomedical knowledge, where entities co-occur in a few documents. Knowledge editing offers a potential solution by injecting this rare information into LLMs, improving their ability to handle such long-tail knowledge.
  • Figure 2: An overview of probing and editing for biomedical knowledge. These knowledge triples are classified into different groups based on co-occurrence number and further divided into one-to-one and one-to-many categories based on the number of correct answers (see § \ref{['sec:in-depth_analysis']}). The increasing performance with the number of co-occurrence number indicates that LLMs struggle to effectively capture long-tail biomedical knowledge before and after editing.
  • Figure 3: The overall performance of pre-edit probing on Llama2, GPT-J, BioMedLM and BioGPT-Large. The shaded areas indicate the standard deviation and Count denotes the number of triples within each group.
  • Figure 4: The performance of knowledge probing after editing with different editing methods on BioMedLM, where "Base" denotes LLM without editing.
  • Figure 5: The comparison of knowledge probing performance between one-to-one and one-to-many settings across different co-occurrence numbers, with the pie chart on the far right illustrating the data distribution.
  • ...and 7 more figures