Table of Contents
Fetching ...

IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment

Antonis Psistakis

TL;DR

The work addresses enabling a virtual-address global memory space for cross-node coherence using ARM's SMMU in an ARMv8 environment, with a focus on Unimem-like scalability. It develops and tests Linux kernel modules to exercise SMMU translation paths in both Processing System and Programmable Logic contexts on a Zynq UltraScale+ MPSoC, validating address translation for DMA operations. Key contributions include a detailed ARM SMMU background, four kernel modules to probe different translation scenarios, and experimental results demonstrating successful virtual-to-physical mappings and DMA transfers within the tested setup, along with an analysis of practical limitations. The findings provide a concrete foundation for applying virtual-address remote DMA within a single node and inform future work toward multi-node, coherently connected systems in line with Unimem goals.

Abstract

In complex systems with many compute nodes containing multiple CPUs that are coherent within each node, a key challenge is maintaining efficient and correct coherence between nodes. The Unimem system addresses this by proposing a virtualized global address space that enables such coherence, relying on the I/O Memory Management Unit (IOMMU) in each node. The goal of this thesis is to support this approach by successfully testing and using the IOMMU of a single node. For this purpose, we used ARM's IOMMU, known as the System Memory Management Unit (SMMU), which translates virtual addresses to physical addresses. Because Linux documentation for the SMMU is limited and unclear, we implemented custom kernel modules to test and use its functionality. First, we tested the SMMU in the Processing System (PS) of the Xilinx Zynq UltraScale+ MPSoC by developing a module that inserted virtual-to-physical address mappings into the SMMU. We then triggered a DMA transfer to a virtual address and observed that the request passed through the SMMU for address translation. We repeated this experiment by initiating DMA transactions from the Programmable Logic (PL) and similarly confirmed that the transactions were translated by the SMMU. Finally, we developed a module that enables transactions from the PL without requiring explicit pre-mapping of virtual and physical address pairs. This was achieved by configuring the SMMU with the page table pointer of a user process, allowing it to translate all relevant virtual addresses dynamically. Overall, we successfully demonstrated the correct operation of the SMMU across all tested scenarios. Due to time constraints, further exploration of advanced SMMU features is left for future work.

IOMMU Support for Virtual-Address Remote DMA in an ARMv8 environment

TL;DR

The work addresses enabling a virtual-address global memory space for cross-node coherence using ARM's SMMU in an ARMv8 environment, with a focus on Unimem-like scalability. It develops and tests Linux kernel modules to exercise SMMU translation paths in both Processing System and Programmable Logic contexts on a Zynq UltraScale+ MPSoC, validating address translation for DMA operations. Key contributions include a detailed ARM SMMU background, four kernel modules to probe different translation scenarios, and experimental results demonstrating successful virtual-to-physical mappings and DMA transfers within the tested setup, along with an analysis of practical limitations. The findings provide a concrete foundation for applying virtual-address remote DMA within a single node and inform future work toward multi-node, coherently connected systems in line with Unimem goals.

Abstract

In complex systems with many compute nodes containing multiple CPUs that are coherent within each node, a key challenge is maintaining efficient and correct coherence between nodes. The Unimem system addresses this by proposing a virtualized global address space that enables such coherence, relying on the I/O Memory Management Unit (IOMMU) in each node. The goal of this thesis is to support this approach by successfully testing and using the IOMMU of a single node. For this purpose, we used ARM's IOMMU, known as the System Memory Management Unit (SMMU), which translates virtual addresses to physical addresses. Because Linux documentation for the SMMU is limited and unclear, we implemented custom kernel modules to test and use its functionality. First, we tested the SMMU in the Processing System (PS) of the Xilinx Zynq UltraScale+ MPSoC by developing a module that inserted virtual-to-physical address mappings into the SMMU. We then triggered a DMA transfer to a virtual address and observed that the request passed through the SMMU for address translation. We repeated this experiment by initiating DMA transactions from the Programmable Logic (PL) and similarly confirmed that the transactions were translated by the SMMU. Finally, we developed a module that enables transactions from the PL without requiring explicit pre-mapping of virtual and physical address pairs. This was achieved by configuring the SMMU with the page table pointer of a user process, allowing it to translate all relevant virtual addresses dynamically. Overall, we successfully demonstrated the correct operation of the SMMU across all tested scenarios. Due to time constraints, further exploration of advanced SMMU features is left for future work.

Paper Structure

This paper contains 59 sections, 76 figures.

Figures (76)

  • Figure 1: A system with many nodes that are connected
  • Figure 1: Zynq UltraScale+ MPSoC Top-Level Block Diagram Source: Xilinx Reference2
  • Figure 2: A simple use of the MMU
  • Figure 2: Comparison of the IOMMU to the MMU Source: https://en.wikipedia.org/wiki/Input-output_memory_management_unit
  • Figure 2: Examples of where a SMMU could be located in a system. Coherent interconnects ensure cache coherence between masters. Source: https://www.arm.com/files/pdf/System-MMU-Whitepaper-v8.0.pdf
  • ...and 71 more figures