Lisbon Computational Linguists at SemEval-2024 Task 2: Using A Mistral 7B Model and Data Augmentation
Artur Guimarães, Bruno Martins, João Magalhães
TL;DR
The paper tackles safe biomedical natural language inference for Clinical Trial Reports (NLI4CT) by leveraging an open-source LLM (Mistral-7B-Instruct-v0.2) quantized to 4 bits and fine-tuned with LoRA on an augmented dataset. A structured prompting strategy and a data-augmentation pipeline are used to improve entailment classification across CTR statements and sections. The approach achieves a macro F1 around 0.80–0.82, with high faithfulness (~0.83) but moderate to lower consistency (~0.72), and reveals robustness gaps to adversarial perturbations, underscoring the need for careful data curation and domain-focused improvements. The work demonstrates the feasibility of deploying open LLMs for medical NLI while highlighting ongoing challenges in faithful and consistent reasoning and suggesting avenues for future enhancement with domain-tuned models and refined prompts.
Abstract
This paper describes our approach to the SemEval-2024 safe biomedical Natural Language Inference for Clinical Trials (NLI4CT) task, which concerns classifying statements about Clinical Trial Reports (CTRs). We explored the capabilities of Mistral-7B, a generalist open-source Large Language Model (LLM). We developed a prompt for the NLI4CT task, and fine-tuned a quantized version of the model using an augmented version of the training dataset. The experimental results show that this approach can produce notable results in terms of the macro F1-score, while having limitations in terms of faithfulness and consistency. All the developed code is publicly available on a GitHub repository
