Table of Contents
Fetching ...

LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs

Akashdeep Saha, Zeng Wang, Prithwish Basu Roy, Johann Knechtel, Ozgur Sinanoglu, Ramesh Karri

TL;DR

LockForge tackles reproducibility in logic locking by converting papers into executable, tested code via a multi-agent LLM workflow. It introduces a four-stage pipeline (Forethought, Implementation, Refinement, Validation) with role-specific LLMs (coder, judge, examiner) and a paper-grounded similarity scoring framework (BCSRP). Applied to 10 LL schemes lacking public references, it produces executable codes and locked benchmarks, with cross-model validation and ablation analyses showing the necessity of each stage and a reliance on advanced reasoning models. The work contributes open-source LL implementations and benchmarks, providing a reproducible platform for evaluating future LL research.

Abstract

Despite rapid progress in logic locking (LL), reproducibility remains a challenge as codes are rarely made public. We present LockForge, a first-of-its-kind, multi-agent large language model (LLM) framework that turns LL descriptions in papers into executable and tested code. LockForge provides a carefully crafted pipeline realizing forethought, implementation, iterative refinement, and a multi-stage validation, all to systematically bridge the gap between prose and practice for complex LL schemes. For validation, we devise (i) an LLM-as-Judge stage with a scoring system considering behavioral checks, conceptual mechanisms, structural elements, and reproducibility on benchmarks, and (ii) an independent LLM-as-Examiner stage for ground-truth assessment. We apply LockForge to 10 seminal LL schemes, many of which lack reference implementations. Our evaluation on multiple SOTA LLMs, including ablation studies, reveals the significant complexity of the task. We show that an advanced reasoning model and a sophisticated, multi-stage framework like LockForge are required. We release all implementations and benchmarks, providing a reproducible and fair foundation for evaluation of further LL research.

LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs

TL;DR

LockForge tackles reproducibility in logic locking by converting papers into executable, tested code via a multi-agent LLM workflow. It introduces a four-stage pipeline (Forethought, Implementation, Refinement, Validation) with role-specific LLMs (coder, judge, examiner) and a paper-grounded similarity scoring framework (BCSRP). Applied to 10 LL schemes lacking public references, it produces executable codes and locked benchmarks, with cross-model validation and ablation analyses showing the necessity of each stage and a reliance on advanced reasoning models. The work contributes open-source LL implementations and benchmarks, providing a reproducible platform for evaluating future LL research.

Abstract

Despite rapid progress in logic locking (LL), reproducibility remains a challenge as codes are rarely made public. We present LockForge, a first-of-its-kind, multi-agent large language model (LLM) framework that turns LL descriptions in papers into executable and tested code. LockForge provides a carefully crafted pipeline realizing forethought, implementation, iterative refinement, and a multi-stage validation, all to systematically bridge the gap between prose and practice for complex LL schemes. For validation, we devise (i) an LLM-as-Judge stage with a scoring system considering behavioral checks, conceptual mechanisms, structural elements, and reproducibility on benchmarks, and (ii) an independent LLM-as-Examiner stage for ground-truth assessment. We apply LockForge to 10 seminal LL schemes, many of which lack reference implementations. Our evaluation on multiple SOTA LLMs, including ablation studies, reveals the significant complexity of the task. We show that an advanced reasoning model and a sophisticated, multi-stage framework like LockForge are required. We release all implementations and benchmarks, providing a reproducible and fair foundation for evaluation of further LL research.

Paper Structure

This paper contains 13 sections, 3 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Open source code for logic locking (2020--2025).
  • Figure 2: $\mathsf{LockForge}$ is a multi-agent LLM workflow. It (i) parses a paper, (ii) drafts and refines code via concept mining and local execution/testing, and (iii) validates via independent LLMs and optional human assessment. LLM-A is a coder with PDF access; LLM-B/C are judges/examiners with no PDF access.
  • Figure 3: Scoring ChatGPT-5 codes by Gemini-2.5pro and DeepSeek-V3.
  • Figure 4: Gemini-2.5pro, DeepSeek-V3 coding; evaluation by ChatGPT-5.
  • Figure 5: ChatGPT-5 coding; evaluation by Gemini-2.5pro, DeepSeek-V3.
  • ...and 1 more figures