Table of Contents
Fetching ...

CRCL at SemEval-2024 Task 2: Simple prompt optimizations

Clément Brutti-Mairesse, Loïc Verlingue

TL;DR

This paper tackles the SemEval-2024 Task 2 NLI problem in a clinical-trial context by applying hard prompt optimization and three prompting strategies (OPRO, zero-shot CoT, dynamic one-shot CoT) to infer entailment/contradiction between statements and CTR sections. It evaluates multiple LLMs (including Mixtral-8x7B-Instruct) and uses a vector-embedding workflow to support exemplar retrieval in the dynamic prompting setup. Faithfulness and Consistency are defined as key robustness metrics, with explicit formulas governing their computation. The results show that zero-shot CoT prompts offer the best improvement in F1, while dynamic one-shot CoT yields the strongest faithfulness and consistency, demonstrating the practical value of advanced prompting techniques in medical NLI without fine-tuning.

Abstract

We present a baseline for the SemEval 2024 task 2 challenge, whose objective is to ascertain the inference relationship between pairs of clinical trial report sections and statements. We apply prompt optimization techniques with LLM Instruct models provided as a Language Model-as-a-Service (LMaaS). We observed, in line with recent findings, that synthetic CoT prompts significantly enhance manually crafted ones.

CRCL at SemEval-2024 Task 2: Simple prompt optimizations

TL;DR

This paper tackles the SemEval-2024 Task 2 NLI problem in a clinical-trial context by applying hard prompt optimization and three prompting strategies (OPRO, zero-shot CoT, dynamic one-shot CoT) to infer entailment/contradiction between statements and CTR sections. It evaluates multiple LLMs (including Mixtral-8x7B-Instruct) and uses a vector-embedding workflow to support exemplar retrieval in the dynamic prompting setup. Faithfulness and Consistency are defined as key robustness metrics, with explicit formulas governing their computation. The results show that zero-shot CoT prompts offer the best improvement in F1, while dynamic one-shot CoT yields the strongest faithfulness and consistency, demonstrating the practical value of advanced prompting techniques in medical NLI without fine-tuning.

Abstract

We present a baseline for the SemEval 2024 task 2 challenge, whose objective is to ascertain the inference relationship between pairs of clinical trial report sections and statements. We apply prompt optimization techniques with LLM Instruct models provided as a Language Model-as-a-Service (LMaaS). We observed, in line with recent findings, that synthetic CoT prompts significantly enhance manually crafted ones.
Paper Structure (16 sections, 2 equations, 3 figures, 2 tables, 3 algorithms)

This paper contains 16 sections, 2 equations, 3 figures, 2 tables, 3 algorithms.

Figures (3)

  • Figure 1: SemEval 2024 dataset data model
  • Figure 2: Dynamic one-shot prompting workflow
  • Figure 3: Zero-shot CoT prompting sample pipeline