LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs
Akashdeep Saha, Zeng Wang, Prithwish Basu Roy, Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri
TL;DR
LockForge tackles reproducibility in logic locking by converting papers into executable, tested code via a multi-agent LLM workflow. It introduces a four-stage pipeline (Forethought, Implementation, Refinement, Validation) with role-specific LLMs (coder, judge, examiner) and a paper-grounded similarity scoring framework (BCSRP). Applied to 10 LL schemes lacking public references, it produces executable codes and locked benchmarks, with cross-model validation and ablation analyses showing the necessity of each stage and a reliance on advanced reasoning models. The work contributes open-source LL implementations and benchmarks, providing a reproducible platform for evaluating future LL research.
Abstract
Despite rapid progress in logic locking (LL), reproducibility remains a challenge as codes are rarely made public. We present LockForge, a first-of-its-kind, multi-agent large language model (LLM) framework that turns LL descriptions in papers into executable and tested code. LockForge provides a carefully crafted pipeline realizing forethought, implementation, iterative refinement, and a multi-stage validation, all to systematically bridge the gap between prose and practice for complex LL schemes. For validation, we devise (i) an LLM-as-Judge stage with a scoring system considering behavioral checks, conceptual mechanisms, structural elements, and reproducibility on benchmarks, and (ii) an independent LLM-as-Examiner stage for ground-truth assessment. We apply LockForge to 10 seminal LL schemes, many of which lack reference implementations. Our evaluation on multiple SOTA LLMs, including ablation studies, reveals the significant complexity of the task. We show that an advanced reasoning model and a sophisticated, multi-stage framework like LockForge are required. We release all implementations and benchmarks, providing a reproducible and fair foundation for evaluation of further LL research.
