Table of Contents
Fetching ...

Security Risks Due to Data Persistence in Cloud FPGA Platforms

Zhehang Zhang, Bharadwaj Madabhushi, Sandip Kundu, Russell Tessier

TL;DR

This paper demonstrates that DRAM attached to cloud FPGA platforms, specifically AMD/Xilinx Alveo U280 in the Open Cloud Testbed, retains usable data well after a user logs out, creating a significant multi-tenant security risk. Using two experimental approaches, the authors show that even after the DRAM controller is removed and the system undergoes a warm reset, a substantial fraction of data remains undecayed for up to 18 minutes, enabling a subsequent user to reconstruct prior victim data, such as an image. The findings highlight the need for explicit DRAM erasure or shredding mechanisms in cloud FPGA runtimes and multi-tenant architectures, as well as consideration of DRAM refresh strategies during VM boot and reallocation. Practically, the work calls cloud providers to implement robust memory-clearing policies to prevent data leakage across tenants and to reconsider rapid reconfiguration timelines that increase data leakage risk.

Abstract

The integration of Field Programmable Gate Arrays (FPGAs) into cloud computing systems has become commonplace. As the operating systems used to manage these systems evolve, special consideration must be given to DRAM devices accessible by FPGAs. These devices may hold sensitive data that can become inadvertently exposed to adversaries following user logout. Although addressed in some cloud FPGA environments, automatic DRAM clearing after process termination is not automatically included in popular FPGA runtime environments nor in most proposed cloud FPGA hypervisors. In this paper, we examine DRAM data persistence in AMD/Xilinx Alveo U280 nodes that are part of the Open Cloud Testbed (OCT). Our results indicate that DDR4 DRAM is not automatically cleared following user logout from an allocated node and subsequent node users can easily obtain recognizable data from the DRAM following node reallocation over 17 minutes later. This issue is particularly relevant for systems which support FPGA multi-tenancy.

Security Risks Due to Data Persistence in Cloud FPGA Platforms

TL;DR

This paper demonstrates that DRAM attached to cloud FPGA platforms, specifically AMD/Xilinx Alveo U280 in the Open Cloud Testbed, retains usable data well after a user logs out, creating a significant multi-tenant security risk. Using two experimental approaches, the authors show that even after the DRAM controller is removed and the system undergoes a warm reset, a substantial fraction of data remains undecayed for up to 18 minutes, enabling a subsequent user to reconstruct prior victim data, such as an image. The findings highlight the need for explicit DRAM erasure or shredding mechanisms in cloud FPGA runtimes and multi-tenant architectures, as well as consideration of DRAM refresh strategies during VM boot and reallocation. Practically, the work calls cloud providers to implement robust memory-clearing policies to prevent data leakage across tenants and to reconsider rapid reconfiguration timelines that increase data leakage risk.

Abstract

The integration of Field Programmable Gate Arrays (FPGAs) into cloud computing systems has become commonplace. As the operating systems used to manage these systems evolve, special consideration must be given to DRAM devices accessible by FPGAs. These devices may hold sensitive data that can become inadvertently exposed to adversaries following user logout. Although addressed in some cloud FPGA environments, automatic DRAM clearing after process termination is not automatically included in popular FPGA runtime environments nor in most proposed cloud FPGA hypervisors. In this paper, we examine DRAM data persistence in AMD/Xilinx Alveo U280 nodes that are part of the Open Cloud Testbed (OCT). Our results indicate that DDR4 DRAM is not automatically cleared following user logout from an allocated node and subsequent node users can easily obtain recognizable data from the DRAM following node reallocation over 17 minutes later. This issue is particularly relevant for systems which support FPGA multi-tenancy.
Paper Structure (12 sections, 5 figures, 1 table)

This paper contains 12 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of AMD Alveo U280 integration in OCT
  • Figure 2: Decay rate of 32-bit words in OCT node PC151. Decay indicates a change in value of any of the 32 bits of a word. Total memory size is 4 GB.
  • Figure 3: Decay rate of 32-bit words for four OCT nodes averaged over four trials. Decay indicates a change in value of any of the 32 bits of a word. Total memory size is 4 GB.
  • Figure 4: Fraction of valid individual bits in four OCT nodes over time. Total memory size is 4 GB.
  • Figure 5: Contrast between the initial image and the image read back via Experiment 1 after 18 minutes