Winning Solution For Meta KDD Cup' 24
Yikuan Xia, Jiazun Chen, Jun Gao
TL;DR
The paper tackles grounded, reliable question answering under the Meta KDD Cup 24 CRAG benchmark by deploying a dual-path Task #1 RAG framework (web retrieval plus public data) with a tuned LLM, and a Task #2/3 framework that uses a regularized knowledge-graph API design and API generation to access structured KG information. It demonstrates first-place performance across all tasks by combining prioritized KG signals, carefully crafted prompts, LoRA-based fine-tuning, and inference acceleration (vLLM), all within tight per-query time budgets. The approach provides a practical blueprint for building scalable, grounded RAG systems that integrate structured graphs with web-derived content, offering insights into balancing speed and grounding in real-world settings.
Abstract
This paper describes the winning solutions of all tasks in Meta KDD Cup 24 from db3 team. The challenge is to build a RAG system from web sources and knowledge graphs. We are given multiple sources for each query to help us answer the question. The CRAG challenge involves three tasks: (1) condensing information from web pages into accurate answers, (2) integrating structured data from mock knowledge graphs, and (3) selecting and integrating critical data from extensive web pages and APIs to reflect real-world retrieval challenges. Our solution for Task #1 is a framework of web or open-data retrieval and answering. The large language model (LLM) is tuned for better RAG performance and less hallucination. Task #2 and Task #3 solutions are based on a regularized API set for domain questions and the API generation method using tuned LLM. Our knowledge graph API interface extracts directly relevant information to help LLMs answer correctly. Our solution achieves 1st place on all three tasks, achieving a score of 28.4%, 42.7%, and 47.8%, respectively.
