Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA

Yuan Pu; Zhuolun He; Tairu Qiu; Haoyuan Wu; Bei Yu

Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA

Yuan Pu, Zhuolun He, Tairu Qiu, Haoyuan Wu, Bei Yu

TL;DR

This work addresses the challenge of applying generic retrieval augmented generation to knowledge-intensive EDA tool documentation by proposing RAG-EDA, a domain-tailored pipeline with three specialized components: domain-customized embedding via contrastive learning, a contrastively fine-tuned reranker, and a two-stage domain-specific LLM generator. It introduces ORD-QA, a 90-question, OpenROAD-based benchmark, to rigorously evaluate retrieval, reranking, and generation in EDA contexts and demonstrates superior performance over state-of-the-art baselines on ORD-QA and a commercial tool. The approach combines hybrid lexical-semantic retrieval, GPT-4 guided reranker supervision, and careful domain pre-training and instruction tuning to produce accurate, domain-consistent QA outputs. The work provides concrete open-source resources (ORD-QA and training data) that enable reproducibility and future research in EDA tool documentation QA, with practical implications for reducing manual support costs in EDA workflows.

Abstract

Retrieval augmented generation (RAG) enhances the accuracy and reliability of generative AI models by sourcing factual information from external databases, which is extensively employed in document-grounded question-answering (QA) tasks. Off-the-shelf RAG flows are well pretrained on general-purpose documents, yet they encounter significant challenges when being applied to knowledge-intensive vertical domains, such as electronic design automation (EDA). This paper addresses such issue by proposing a customized RAG framework along with three domain-specific techniques for EDA tool documentation QA, including a contrastive learning scheme for text embedding model fine-tuning, a reranker distilled from proprietary LLM, and a generative LLM fine-tuned with high-quality domain corpus. Furthermore, we have developed and released a documentation QA evaluation benchmark, ORD-QA, for OpenROAD, an advanced RTL-to-GDSII design platform. Experimental results demonstrate that our proposed RAG flow and techniques have achieved superior performance on ORD-QA as well as on a commercial tool, compared with state-of-the-arts. The ORD-QA benchmark and the training dataset for our customized RAG flow are open-source at https://github.com/lesliepy99/RAG-EDA.

Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA

TL;DR

Abstract

Paper Structure (18 sections, 8 equations, 9 figures, 5 tables)

This paper contains 18 sections, 8 equations, 9 figures, 5 tables.

Introduction
Preliminaries
Information Retrieval
Performance Measurement
Algorithms
Domain-Customized Text Embedding
Hybrid Information Retrieval
Reranker Finetuning
Domain-Specific LLM Generator
Benchmark
Experimental Results
Training Dataset Collection
Experimental Setting
Evaluation: Text Embedding Model
Evaluation: Reranker Model
...and 3 more sections

Figures (9)

Figure 1: Illustration of the RAG flow.
Figure 2: Overview of RAG-EDA, our proposed RAG flow for EDA tool documentation QA.
Figure 3: A fail case of general text embedding model in EDA-specific information retrieval (about Primary Input). Doc1/Doc2 defines PI/PO. The general text embedding model mistakenly perceives higher similarity between the question and Doc2.
Figure 4: A contrastive data sample used for text embedding model finetuning.
Figure 5: Weakly-related documents do harm to answer generation quality for EDA-tool questions.
...and 4 more figures

Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA

TL;DR

Abstract

Customized Retrieval Augmented Generation and Benchmarking for EDA Tool Documentation QA

Authors

TL;DR

Abstract

Table of Contents

Figures (9)