Table of Contents
Fetching ...

Enhancing Frame Detection with Retrieval Augmented Generation

Papa Abdou Karim Karou Diallo, Amal Zouaq

TL;DR

This work tackles frame detection from raw text without explicit target spans by introducing RCIF, a Retrieve Candidates and Identify Frames pipeline that combines a frozen-RAG retrieval stage with an LLM classifier. It leverages multiple frame representations to form embeddings, retrieves a broad candidate set, and then selectively identifies the most appropriate frames, achieving state-of-the-art performance on FrameNet 1.5 and 1.7. The authors also demonstrate that the structured frame representations can improve generalization for translating natural language questions into SPARQL queries, evidenced on LCQ2F and LCQ2F+ datasets. The approach offers practical robustness for real-world NLP tasks where explicit targets are unavailable and lexical variation is high, highlighting a promising direction for combining structured semantic representations with retrieval-augmented generation.

Abstract

Recent advancements in Natural Language Processing have significantly improved the extraction of structured semantic representations from unstructured text, especially through Frame Semantic Role Labeling (FSRL). Despite this progress, the potential of Retrieval-Augmented Generation (RAG) models for frame detection remains under-explored. In this paper, we present the first RAG-based approach for frame detection called RCIF (Retrieve Candidates and Identify Frames). RCIF is also the first approach to operate without the need for explicit target span and comprises three main stages: (1) generation of frame embeddings from various representations ; (2) retrieval of candidate frames given an input text; and (3) identification of the most suitable frames. We conducted extensive experiments across multiple configurations, including zero-shot, few-shot, and fine-tuning settings. Our results show that our retrieval component significantly reduces the complexity of the task by narrowing the search space thus allowing the frame identifier to refine and complete the set of candidates. Our approach achieves state-of-the-art performance on FrameNet 1.5 and 1.7, demonstrating its robustness in scenarios where only raw text is provided. Furthermore, we leverage the structured representation obtained through this method as a proxy to enhance generalization across lexical variations in the task of translating natural language questions into SPARQL queries.

Enhancing Frame Detection with Retrieval Augmented Generation

TL;DR

This work tackles frame detection from raw text without explicit target spans by introducing RCIF, a Retrieve Candidates and Identify Frames pipeline that combines a frozen-RAG retrieval stage with an LLM classifier. It leverages multiple frame representations to form embeddings, retrieves a broad candidate set, and then selectively identifies the most appropriate frames, achieving state-of-the-art performance on FrameNet 1.5 and 1.7. The authors also demonstrate that the structured frame representations can improve generalization for translating natural language questions into SPARQL queries, evidenced on LCQ2F and LCQ2F+ datasets. The approach offers practical robustness for real-world NLP tasks where explicit targets are unavailable and lexical variation is high, highlighting a promising direction for combining structured semantic representations with retrieval-augmented generation.

Abstract

Recent advancements in Natural Language Processing have significantly improved the extraction of structured semantic representations from unstructured text, especially through Frame Semantic Role Labeling (FSRL). Despite this progress, the potential of Retrieval-Augmented Generation (RAG) models for frame detection remains under-explored. In this paper, we present the first RAG-based approach for frame detection called RCIF (Retrieve Candidates and Identify Frames). RCIF is also the first approach to operate without the need for explicit target span and comprises three main stages: (1) generation of frame embeddings from various representations ; (2) retrieval of candidate frames given an input text; and (3) identification of the most suitable frames. We conducted extensive experiments across multiple configurations, including zero-shot, few-shot, and fine-tuning settings. Our results show that our retrieval component significantly reduces the complexity of the task by narrowing the search space thus allowing the frame identifier to refine and complete the set of candidates. Our approach achieves state-of-the-art performance on FrameNet 1.5 and 1.7, demonstrating its robustness in scenarios where only raw text is provided. Furthermore, we leverage the structured representation obtained through this method as a proxy to enhance generalization across lexical variations in the task of translating natural language questions into SPARQL queries.

Paper Structure

This paper contains 26 sections, 9 figures, 13 tables.

Figures (9)

  • Figure 1: Overview of our proposed method called RCIF (Retrieve Candidates and Identify Frames). (1) Frame embeddings are generated using an embedding model based on various frame representations. These embeddings are stored in a vector database. (2-3) Given an input text, the system retrieves candidate frames based on similarity scores of input text and frames embeddings. (4-5) An LLM is then fine-tuned with dynamic prompts to select the best matching frames from the retrieved candidates, completing identification process.
  • Figure 2: Llama3.1 BLEU Score Performances on LCQ2F
  • Figure 3: Different representations of frames used in the retrieval component. Representation1 consists of the frame label and its textual description. Representation2 extends the previous one by appending a list of lexical units, while Representation3 further enriches Representation2 by incorporating a list of frame elements, resulting in a more comprehensive one.
  • Figure 4: Complexity evolution of the task of frame detection with and without candidates filtering and different values for the number of candidates C.
  • Figure 5: Example of the dynamic prompt used to fine-tune the LLM
  • ...and 4 more figures