Table of Contents
Fetching ...

Enabling Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension

Hongbin Yang, Huanle Zhang, Runyu Pan

Abstract

The growing complexity of real-time embedded systems demands strong isolation of software components into separate protection domains to reduce attack surfaces and limit fault propagation. However, application-supplied device interrupt handlers -- even untrusted -- have to remain in the kernel to minimize interrupt latency, undermining security and burdening manual certifications. Current hardware extensions accelerate interrupts only when the target protection domain is scheduled by the kernel; consequently, they are limited to improving average-case performance but not worst-case latency, and do not meet the requirements of critical real-time applications such as autonomous vehicles or robots. To overcome this limitation, we propose a novel hardware extension that enables direct, deterministic switching to the appropriate protection domain upon user-level interrupt arrival -- without kernel intervention -- even when that domain is dormant. Our hardware extension reduces worst-case latency by more than 50x with a 19% increase in core area (2% of total die area) and 4.1% increase in dynamic power. To the best of our knowledge, this is the first integrated mechanism to guarantee user-level interrupt delivery with a nanosecond-scale yet bounded worst-case latency.

Enabling Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension

Abstract

The growing complexity of real-time embedded systems demands strong isolation of software components into separate protection domains to reduce attack surfaces and limit fault propagation. However, application-supplied device interrupt handlers -- even untrusted -- have to remain in the kernel to minimize interrupt latency, undermining security and burdening manual certifications. Current hardware extensions accelerate interrupts only when the target protection domain is scheduled by the kernel; consequently, they are limited to improving average-case performance but not worst-case latency, and do not meet the requirements of critical real-time applications such as autonomous vehicles or robots. To overcome this limitation, we propose a novel hardware extension that enables direct, deterministic switching to the appropriate protection domain upon user-level interrupt arrival -- without kernel intervention -- even when that domain is dormant. Our hardware extension reduces worst-case latency by more than 50x with a 19% increase in core area (2% of total die area) and 4.1% increase in dynamic power. To the best of our knowledge, this is the first integrated mechanism to guarantee user-level interrupt delivery with a nanosecond-scale yet bounded worst-case latency.

Paper Structure

This paper contains 35 sections, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Proposed extension (V5 variant; for detailed variant description, see §\ref{['ss:t2']}) block diagram and data flow. The extra TCM blocks are in yellow, and the intra-core extension are in blue.
  • Figure 2: User-level interrupt API and software-hardware workflow.
  • Figure 3: Timing diagram of the V1 variant upon user-level interrupt activation, with a latency of 38 cycles (2 more cycles are needed for the first fetched interrupt vector instruction to reach the execute stage). a$\to$b shows PMP table consulting and PMP updating, c$\to$d shows budget table consulting and timer updating, while e$\to$f shows register stacking. The kernel-managed PMP is shadow banked and not shown.
  • Figure 4: Timing diagram of the V2 variant upon user-level interrupt activation, with a latency of 29 cycles. The commentary is the same as that of Figure \ref{['fig:c1var']}. The deciding factor here however, is still register context stacking.
  • Figure 5: Timing diagram of the V5 variant upon user-level interrupt activation, with a latency of 11 cycles. The commentary is the same as that of Figure \ref{['fig:c1var']}, except that the IID consulting is merged with interrupt controller logic and cost no extra cycles, while the register stacking is eliminated by banking; hence they are not shown.
  • ...and 4 more figures