Table of Contents
Fetching ...

Unleashing GHOST: An LLM-Powered Framework for Automated Hardware Trojan Design

Md Omar Faruque, Peter Jamieson, Ahmad Patooghy, Abdel-Hameed A. Badawy

TL;DR

The paper presents GHOST, an automated HT design and insertion framework that harnesses Large Language Models to rapidly generate stealthy Hardware Trojans at the RTL level. By combining Role-Based Prompting, Reflexive Validation Prompting, and Contextual Trojan Prompting, GHOST demonstrates cross-design HT generation across SRAM, AES, and UART, evaluating GPT-4, Gemini-1.5-pro, and LLaMA3. GPT-4 achieves high HT-generation success with strong synthesis survivability and, critically, evades state-of-the-art ML-based detectors, underscoring a significant security risk and the need for advanced defenses. The work provides a set of benchmarks and an evaluation methodology spanning pre- and post-synthesis stages, highlighting both the offensive potential of AI-assisted HT design and the urgent imperative to strengthen hardware security against such threats.

Abstract

Traditionally, inserting realistic Hardware Trojans (HTs) into complex hardware systems has been a time-consuming and manual process, requiring comprehensive knowledge of the design and navigating intricate Hardware Description Language (HDL) codebases. Machine Learning (ML)-based approaches have attempted to automate this process but often face challenges such as the need for extensive training data, long learning times, and limited generalizability across diverse hardware design landscapes. This paper addresses these challenges by proposing GHOST (Generator for Hardware-Oriented Stealthy Trojans), an automated attack framework that leverages Large Language Models (LLMs) for rapid HT generation and insertion. Our study evaluates three state-of-the-art LLMs - GPT-4, Gemini-1.5-pro, and Llama-3-70B - across three hardware designs: SRAM, AES, and UART. According to our evaluations, GPT-4 demonstrates superior performance, with 88.88% of HT insertion attempts successfully generating functional and synthesizable HTs. This study also highlights the security risks posed by LLM-generated HTs, showing that 100% of GHOST-generated synthesizable HTs evaded detection by an ML-based HT detection tool. These results underscore the urgent need for advanced detection and prevention mechanisms in hardware security to address the emerging threat of LLM-generated HTs. The GHOST HT benchmarks are available at: https://github.com/HSTRG1/GHOSTbenchmarks.git

Unleashing GHOST: An LLM-Powered Framework for Automated Hardware Trojan Design

TL;DR

The paper presents GHOST, an automated HT design and insertion framework that harnesses Large Language Models to rapidly generate stealthy Hardware Trojans at the RTL level. By combining Role-Based Prompting, Reflexive Validation Prompting, and Contextual Trojan Prompting, GHOST demonstrates cross-design HT generation across SRAM, AES, and UART, evaluating GPT-4, Gemini-1.5-pro, and LLaMA3. GPT-4 achieves high HT-generation success with strong synthesis survivability and, critically, evades state-of-the-art ML-based detectors, underscoring a significant security risk and the need for advanced defenses. The work provides a set of benchmarks and an evaluation methodology spanning pre- and post-synthesis stages, highlighting both the offensive potential of AI-assisted HT design and the urgent imperative to strengthen hardware security against such threats.

Abstract

Traditionally, inserting realistic Hardware Trojans (HTs) into complex hardware systems has been a time-consuming and manual process, requiring comprehensive knowledge of the design and navigating intricate Hardware Description Language (HDL) codebases. Machine Learning (ML)-based approaches have attempted to automate this process but often face challenges such as the need for extensive training data, long learning times, and limited generalizability across diverse hardware design landscapes. This paper addresses these challenges by proposing GHOST (Generator for Hardware-Oriented Stealthy Trojans), an automated attack framework that leverages Large Language Models (LLMs) for rapid HT generation and insertion. Our study evaluates three state-of-the-art LLMs - GPT-4, Gemini-1.5-pro, and Llama-3-70B - across three hardware designs: SRAM, AES, and UART. According to our evaluations, GPT-4 demonstrates superior performance, with 88.88% of HT insertion attempts successfully generating functional and synthesizable HTs. This study also highlights the security risks posed by LLM-generated HTs, showing that 100% of GHOST-generated synthesizable HTs evaded detection by an ML-based HT detection tool. These results underscore the urgent need for advanced detection and prevention mechanisms in hardware security to address the emerging threat of LLM-generated HTs. The GHOST HT benchmarks are available at: https://github.com/HSTRG1/GHOSTbenchmarks.git

Paper Structure

This paper contains 25 sections, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Assumed Threat Model.
  • Figure 2: GHOST Framework Key Components.
  • Figure 3: Python code to generate an HT in Verilog using OpenAI's API call
  • Figure 4: Evaluation Framework overview.
  • Figure 5: Information Leakage HT inserted in AES-128 RTL by GPT-4
  • ...and 2 more figures