Enhancing Frame Detection with Retrieval Augmented Generation
Papa Abdou Karim Karou Diallo, Amal Zouaq
TL;DR
This work tackles frame detection from raw text without explicit target spans by introducing RCIF, a Retrieve Candidates and Identify Frames pipeline that combines a frozen-RAG retrieval stage with an LLM classifier. It leverages multiple frame representations to form embeddings, retrieves a broad candidate set, and then selectively identifies the most appropriate frames, achieving state-of-the-art performance on FrameNet 1.5 and 1.7. The authors also demonstrate that the structured frame representations can improve generalization for translating natural language questions into SPARQL queries, evidenced on LCQ2F and LCQ2F+ datasets. The approach offers practical robustness for real-world NLP tasks where explicit targets are unavailable and lexical variation is high, highlighting a promising direction for combining structured semantic representations with retrieval-augmented generation.
Abstract
Recent advancements in Natural Language Processing have significantly improved the extraction of structured semantic representations from unstructured text, especially through Frame Semantic Role Labeling (FSRL). Despite this progress, the potential of Retrieval-Augmented Generation (RAG) models for frame detection remains under-explored. In this paper, we present the first RAG-based approach for frame detection called RCIF (Retrieve Candidates and Identify Frames). RCIF is also the first approach to operate without the need for explicit target span and comprises three main stages: (1) generation of frame embeddings from various representations ; (2) retrieval of candidate frames given an input text; and (3) identification of the most suitable frames. We conducted extensive experiments across multiple configurations, including zero-shot, few-shot, and fine-tuning settings. Our results show that our retrieval component significantly reduces the complexity of the task by narrowing the search space thus allowing the frame identifier to refine and complete the set of candidates. Our approach achieves state-of-the-art performance on FrameNet 1.5 and 1.7, demonstrating its robustness in scenarios where only raw text is provided. Furthermore, we leverage the structured representation obtained through this method as a proxy to enhance generalization across lexical variations in the task of translating natural language questions into SPARQL queries.
