Table of Contents
Fetching ...

A Hardware-Native Realisation of Semi-Empirical Electronic Structure Theory on Field-Programmable Gate Arrays

Xincheng Miao, Roland Mitrić

TL;DR

This work demonstrates a hardware-native FPGA realization of semi-empirical electronic-structure methods, implementing Extended Hückel Theory and non-self-consistent DFTB on an Artix-7 device using a streaming dataflow. By fusing Hamiltonian construction, matrix assembly, and diagonalisation into on-device pipelines, the approach achieves deterministic, host-free execution and notable throughput improvements for specific kernels, notably a >4× CPU speedup in stand-alone DFTB0 Hamiltonian generation. Full EHT/DFTB0 workflows remain diagonalisation-bound, but the results highlight substantial energy efficiency advantages of FPGA streaming for regular arithmetic-heavy kernels and identify concrete paths to further improvements through hardware eigensolvers and heterogeneous CPU-FPGA execution. Collectively, the study provides architectural proof-of-concept for scalable, energy-efficient hardware-native semi-empirical toolchains and informs routes toward FPGA-based acceleration of broader electronic-structure methods.

Abstract

High-throughput quantum-chemical calculations underpin modern molecular modelling, materials discovery, and machine-learning workflows, yet even semi-empirical methods become restrictive when many molecules must be evaluated. Here we report the first hardware-native realisation of semi-empirical electronic structure theory on a field-programmable gate array (FPGA), implementing as a proof of principle Extended Hückel Theory (EHT) and non-self-consistent Density Functional Tight Binding (DFTB0). Our design performs Hamiltonian construction and diagonalisation on the FPGA device through a streaming dataflow, enabling deterministic execution without host intervention. On a mid-range Artix-7 FPGA, the DFTB0 Hamiltonian generator delivers a throughput over fourfold higher than that of a contemporary server-class CPU. Improvements in eigensolver design, memory capacity, and extensions to nuclear gradients and excited states could further expand capability. Combined with the inherent energy efficiency of FPGA dataflow, this work opens a pathway towards sustainable, hardware-native acceleration of electronic-structure simulation and direct hardware implementations of a broad class of methods.

A Hardware-Native Realisation of Semi-Empirical Electronic Structure Theory on Field-Programmable Gate Arrays

TL;DR

This work demonstrates a hardware-native FPGA realization of semi-empirical electronic-structure methods, implementing Extended Hückel Theory and non-self-consistent DFTB on an Artix-7 device using a streaming dataflow. By fusing Hamiltonian construction, matrix assembly, and diagonalisation into on-device pipelines, the approach achieves deterministic, host-free execution and notable throughput improvements for specific kernels, notably a >4× CPU speedup in stand-alone DFTB0 Hamiltonian generation. Full EHT/DFTB0 workflows remain diagonalisation-bound, but the results highlight substantial energy efficiency advantages of FPGA streaming for regular arithmetic-heavy kernels and identify concrete paths to further improvements through hardware eigensolvers and heterogeneous CPU-FPGA execution. Collectively, the study provides architectural proof-of-concept for scalable, energy-efficient hardware-native semi-empirical toolchains and informs routes toward FPGA-based acceleration of broader electronic-structure methods.

Abstract

High-throughput quantum-chemical calculations underpin modern molecular modelling, materials discovery, and machine-learning workflows, yet even semi-empirical methods become restrictive when many molecules must be evaluated. Here we report the first hardware-native realisation of semi-empirical electronic structure theory on a field-programmable gate array (FPGA), implementing as a proof of principle Extended Hückel Theory (EHT) and non-self-consistent Density Functional Tight Binding (DFTB0). Our design performs Hamiltonian construction and diagonalisation on the FPGA device through a streaming dataflow, enabling deterministic execution without host intervention. On a mid-range Artix-7 FPGA, the DFTB0 Hamiltonian generator delivers a throughput over fourfold higher than that of a contemporary server-class CPU. Improvements in eigensolver design, memory capacity, and extensions to nuclear gradients and excited states could further expand capability. Combined with the inherent energy efficiency of FPGA dataflow, this work opens a pathway towards sustainable, hardware-native acceleration of electronic-structure simulation and direct hardware implementations of a broad class of methods.
Paper Structure (21 sections, 11 equations, 5 figures, 7 tables)

This paper contains 21 sections, 11 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Schematic dataflow graph of the tight-binding HLS workflow with kernel-level detail. The electronic-structure calculation is mapped to a streaming task graph composed of coordinate loading, pair generation, Hamiltonian-element evaluation, matrix assembly, diagonalisation, and energy evaluation. Independent HLS kernels communicate via buffered streams that carry coordinates, orbital-index pairs, and Hamiltonian elements. Representative HLS/C++ code excerpts illustrate the corresponding stream interfaces and pipelined loop structure. The assembled Hamiltonian enters the diagonalisation stage once the matrix is fully assembled.
  • Figure 2: Schematic dataflow graph of the stand-alone Hamiltonian-generation HLS workflow with kernel-level detail. Pair generation and Hamiltonian-element evaluation are duplicated for even- and odd-indexed orbital pairs to sustain peak throughput. Coordinates are broadcast to both branches, while each branch produces a stream of elements which are subsequently merged by a streaming stage into a single output stream. Representative HLS/C++ code excerpts show the corresponding even/odd pair generators.
  • Figure 3: Execution time per processed geometry for the EHT module on the FPGA as a function of the number of atomic orbitals.
  • Figure 4: Execution time per processed geometry for the DFTB0 module on the FPGA as a function of the number of atomic orbitals.
  • Figure 5: Execution time per processed geometry for the stand-alone DFTB0 Hamiltonian generation module on the FPGA as a function of the number of atomic orbitals.