PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination

Hyunseung Lim; Sooyohn Nam; Sungmin Na; Ji Yong Cho; June Yong Yang; Hyungyu Shin; Yoonjoo Lee; Juho Kim; Moontae Lee; Hwajung Hong

PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination

Hyunseung Lim, Sooyohn Nam, Sungmin Na, Ji Yong Cho, June Yong Yang, Hyungyu Shin, Yoonjoo Lee, Juho Kim, Moontae Lee, Hwajung Hong

TL;DR

PANORAMA introduces a large-scale, claim-level dataset of USPTO patent examination records that preserves examiner decision trails and rationales. It decomposes the examination process into three sequential benchmarks—prior-art retrieval (PAR4PC), paragraph identification (PI4PC), and novelty/non-obviousness classification (NOC4PC)—to evaluate LLMs at each step of patent review. Baseline results show that current models retrieve relevant art but struggle with robust novelty and non-obviousness judgments, while supervised fine-tuning on PANORAMA yields consistent improvements. The dataset enables more realistic modeling of patent examination and motivates future work to better capture expert reasoning and cross-jurisdictional variations, with data publicly available for reproducibility.

Abstract

Patent examination remains an ongoing challenge in the NLP literature even after the advent of large language models (LLMs), as it requires an extensive yet nuanced human judgment on whether a submitted claim meets the statutory standards of novelty and non-obviousness against previously granted claims -- prior art -- in expert domains. Previous NLP studies have approached this challenge as a prediction task (e.g., forecasting grant outcomes) with high-level proxies such as similarity metrics or classifiers trained on historical labels. However, this approach often overlooks the step-by-step evaluations that examiners must make with profound information, including rationales for the decisions provided in office actions documents, which also makes it harder to measure the current state of techniques in patent review processes. To fill this gap, we construct PANORAMA, a dataset of 8,143 U.S. patent examination records that preserves the full decision trails, including original applications, all cited references, Non-Final Rejections, and Notices of Allowance. Also, PANORAMA decomposes the trails into sequential benchmarks that emulate patent professionals' patent review processes and allow researchers to examine large language models' capabilities at each step of them. Our findings indicate that, although LLMs are relatively effective at retrieving relevant prior art and pinpointing the pertinent paragraphs, they struggle to assess the novelty and non-obviousness of patent claims. We discuss these results and argue that advancing NLP, including LLMs, in the patent domain requires a deeper understanding of real-world patent examination. Our dataset is openly available at https://huggingface.co/datasets/LG-AI-Research/PANORAMA.

PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination

TL;DR

Abstract

PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)