Table of Contents
Fetching ...

Enhancing Robustness in Biomedical NLI Models: A Probing Approach for Clinical Trials

Ata Mustafa

TL;DR

This paper addresses the challenge of robustness and faithful reasoning in biomedical natural language inference (NLI) for clinical trials, where shortcut learning and data shifts can yield unreliable conclusions. It proposes a mnestic probing framework grounded in natural logic to examine monotonicity and concept inclusion, applied to a SciFive-based base model fine-tuned on clinical-trials data. The approach combines data preprocessing, targeted probes, and iterative null-space projection to prune nonessential features and improve semantic integrity, demonstrated by a 2-point accuracy gain on probing tasks and enhanced robustness. The work offers a principled method to bolster trustworthy automated reasoning in clinical trial analysis, with potential to reduce misinterpretation risks in medical decision-making.

Abstract

Large Language Models have revolutionized various fields and industries, such as Conversational AI, Content Generation, Information Retrieval, Business Intelligence, and Medical, to name a few. One major application in the field of medical is to analyze and investigate clinical trials for entailment tasks.However, It has been observed that Large Language Models are susceptible to shortcut learning, factual inconsistency, and performance degradation with little variation in context. Adversarial and robust testing is performed to ensure the integrity of models output. But, ambiguity still persists. In order to ensure the integrity of the reasoning performed and investigate the model has correct syntactic and semantic understanding probing is used. Here, I used mnestic probing to investigate the Sci-five model, trained on clinical trial. I investigated the model for feature learnt with respect to natural logic. To achieve the target, I trained task specific probes. Used these probes to investigate the final layers of trained model. Then, fine tuned the trained model using iterative null projection. The results shows that model accuracy improved. During experimentation, I observed that size of the probe has affect on the fine tuning process.

Enhancing Robustness in Biomedical NLI Models: A Probing Approach for Clinical Trials

TL;DR

This paper addresses the challenge of robustness and faithful reasoning in biomedical natural language inference (NLI) for clinical trials, where shortcut learning and data shifts can yield unreliable conclusions. It proposes a mnestic probing framework grounded in natural logic to examine monotonicity and concept inclusion, applied to a SciFive-based base model fine-tuned on clinical-trials data. The approach combines data preprocessing, targeted probes, and iterative null-space projection to prune nonessential features and improve semantic integrity, demonstrated by a 2-point accuracy gain on probing tasks and enhanced robustness. The work offers a principled method to bolster trustworthy automated reasoning in clinical trial analysis, with potential to reduce misinterpretation risks in medical decision-making.

Abstract

Large Language Models have revolutionized various fields and industries, such as Conversational AI, Content Generation, Information Retrieval, Business Intelligence, and Medical, to name a few. One major application in the field of medical is to analyze and investigate clinical trials for entailment tasks.However, It has been observed that Large Language Models are susceptible to shortcut learning, factual inconsistency, and performance degradation with little variation in context. Adversarial and robust testing is performed to ensure the integrity of models output. But, ambiguity still persists. In order to ensure the integrity of the reasoning performed and investigate the model has correct syntactic and semantic understanding probing is used. Here, I used mnestic probing to investigate the Sci-five model, trained on clinical trial. I investigated the model for feature learnt with respect to natural logic. To achieve the target, I trained task specific probes. Used these probes to investigate the final layers of trained model. Then, fine tuned the trained model using iterative null projection. The results shows that model accuracy improved. During experimentation, I observed that size of the probe has affect on the fine tuning process.
Paper Structure (15 sections, 1 equation, 5 figures, 1 table)

This paper contains 15 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: An example of miss classification
  • Figure 2: Development approach
  • Figure 3: Data Preprocessing
  • Figure 4: Base Model
  • Figure :