Table of Contents
Fetching ...

Cement2: Temporal Hardware Transactions for High-Level and Efficient FPGA Programming

Youwei Xiao, Zizhang Luo, Weijie Peng, Yuyang Zou, Yun Liang

TL;DR

Cement2 tackles the challenge of raising hardware design abstraction without sacrificing cycle accuracy by introducing temporal hardware transactions, a timing-aware extension to transactional HDLs. Implemented as the Cement2 framework with a Rust frontend (CMT2-rs) and CTIR, it provides inter-cycle analysis, multi-cycle rules, and a multi-phase synthesis flow that yields efficient RTL for FPGA. The approach supports both intra-cycle and multi-cycle behaviors, enabling precise timing coordination and retiming, and it demonstrates competitive performance and hardware quality across RISC-V, custom instructions, linear algebra kernels, and systolic arrays. The results suggest wide applicability for general FPGA programming, offering productivity gains while maintaining fine-grained control over timing and resources.

Abstract

Hardware design faces a fundamental challenge: raising abstraction to improve productivity while maintaining control over low-level details like cycle accuracy. Traditional RTL design in languages like SystemVerilog composes modules through wiring-style connections that provide weak guarantees for behavioral correctness. While high-level synthesis (HLS) and emerging abstractions attempt to address this, they either introduce unpredictable overhead or restrict design generality. Although transactional HDLs provide a promising foundation by lifting design abstraction to atomic and composable rules, they solely model intra-cycle behavior and do not reflect the native temporal design characteristics, hindering applicability and productivity for FPGA programming scenarios. We propose temporal hardware transactions, a new abstraction that brings cycle-level timing awareness to designers at the transactional language level. Our approach models temporal relationships between rules and supports the description of rules whose actions span multiple clock cycles, providing intuitive abstraction to describe multi-cycle architectural behavior. We implement this in Cement2, a transactional HDL embedded in Rust, enabling programming hardware constructors to build both intra-cycle and temporal transactions. Cement2's synthesis framework lowers description abstraction through multiple analysis and optimization phases, generating efficient hardware. With Cement2's abstraction, we program a RISC-V soft-core processor, custom CPU instructions, linear algebra kernels, and systolic array accelerators, leveraging the high-level abstraction for boosted productivity. Evaluation shows that Cement2 does not sacrifice performance and resources compared to hand-coded RTL designs, demonstrating the high applicability for general FPGA design tasks.

Cement2: Temporal Hardware Transactions for High-Level and Efficient FPGA Programming

TL;DR

Cement2 tackles the challenge of raising hardware design abstraction without sacrificing cycle accuracy by introducing temporal hardware transactions, a timing-aware extension to transactional HDLs. Implemented as the Cement2 framework with a Rust frontend (CMT2-rs) and CTIR, it provides inter-cycle analysis, multi-cycle rules, and a multi-phase synthesis flow that yields efficient RTL for FPGA. The approach supports both intra-cycle and multi-cycle behaviors, enabling precise timing coordination and retiming, and it demonstrates competitive performance and hardware quality across RISC-V, custom instructions, linear algebra kernels, and systolic arrays. The results suggest wide applicability for general FPGA programming, offering productivity gains while maintaining fine-grained control over timing and resources.

Abstract

Hardware design faces a fundamental challenge: raising abstraction to improve productivity while maintaining control over low-level details like cycle accuracy. Traditional RTL design in languages like SystemVerilog composes modules through wiring-style connections that provide weak guarantees for behavioral correctness. While high-level synthesis (HLS) and emerging abstractions attempt to address this, they either introduce unpredictable overhead or restrict design generality. Although transactional HDLs provide a promising foundation by lifting design abstraction to atomic and composable rules, they solely model intra-cycle behavior and do not reflect the native temporal design characteristics, hindering applicability and productivity for FPGA programming scenarios. We propose temporal hardware transactions, a new abstraction that brings cycle-level timing awareness to designers at the transactional language level. Our approach models temporal relationships between rules and supports the description of rules whose actions span multiple clock cycles, providing intuitive abstraction to describe multi-cycle architectural behavior. We implement this in Cement2, a transactional HDL embedded in Rust, enabling programming hardware constructors to build both intra-cycle and temporal transactions. Cement2's synthesis framework lowers description abstraction through multiple analysis and optimization phases, generating efficient hardware. With Cement2's abstraction, we program a RISC-V soft-core processor, custom CPU instructions, linear algebra kernels, and systolic array accelerators, leveraging the high-level abstraction for boosted productivity. Evaluation shows that Cement2 does not sacrifice performance and resources compared to hand-coded RTL designs, demonstrating the high applicability for general FPGA design tasks.

Paper Structure

This paper contains 20 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Motivating example: illustrating the 5-stage CPU core pipeline described in different abstraction levels.
  • Figure 2: Syntax of CMT2-rs provides a unified description for intra-cycle and temporal hardware transactions.
  • Figure 3: Avoid producer-consumer mismatch.
  • Figure 4: Implementations of a 8-bit restoring division pipeline with temporal hardware transactions.
  • Figure 5: CTIR and synthesis flow
  • ...and 2 more figures

Theorems & Definitions (1)

  • definition 1: Transactional Hardware Module