Table of Contents
Fetching ...

A Review of Techniques for Ageing Detection and Monitoring on Embedded Systems

Leandro Lanzieri, Gianluca Martino, Goerschwin Fey, Holger Schlarb, Thomas C. Schmidt, Matthias Wählisch

TL;DR

This survey consolidates ageing mechanisms affecting embedded COTS components (FPGAs, MCUs/SoCs, and power supplies) and catalogues detection/monitoring approaches. It organizes techniques by component and sensing principle, highlighting a strong emphasis on online FPGA methods (often using ring oscillators, shadow registers, or transition-probability measurements) and a trend toward ML-assisted analysis, while MCU/SoC and power-supply studies remain largely offline. The work identifies gaps in system-level health assessment, emphasizes the potential for online, adaptive monitoring, and proposes research directions including tinyML-enabled on-device ageing inference and predictive maintenance. Overall, it provides a structured, actionable overview to guide future development of ageing-aware, self-monitoring embedded systems with practical implications for reliability in harsh or long-duration deployments.

Abstract

Embedded digital devices are progressively deployed in dependable or safety-critical systems. These devices undergo significant hardware ageing, particularly in harsh environments. This increases their likelihood of failure. It is crucial to understand ageing processes and to detect hardware degradation early for guaranteeing system dependability. In this survey, we review the core ageing mechanisms, identify and categorize general working principles of ageing detection and monitoring techniques for Commercial-Off-The-Shelf (COTS) components that are prevalent in embedded systems: Field Programmable Gate Arrays (FPGAs), microcontrollers, System-on-Chips (SoCs), and their power supplies. From our review, we find that online techniques are more widely applied on FPGAs than on other components, and see a rising trend towards machine learning application for analysing hardware ageing. Based on the reviewed literature, we identify research opportunities and potential directions of interest in the field. With this work, we intend to facilitate future research by systematically presenting all main approaches in a concise way.

A Review of Techniques for Ageing Detection and Monitoring on Embedded Systems

TL;DR

This survey consolidates ageing mechanisms affecting embedded COTS components (FPGAs, MCUs/SoCs, and power supplies) and catalogues detection/monitoring approaches. It organizes techniques by component and sensing principle, highlighting a strong emphasis on online FPGA methods (often using ring oscillators, shadow registers, or transition-probability measurements) and a trend toward ML-assisted analysis, while MCU/SoC and power-supply studies remain largely offline. The work identifies gaps in system-level health assessment, emphasizes the potential for online, adaptive monitoring, and proposes research directions including tinyML-enabled on-device ageing inference and predictive maintenance. Overall, it provides a structured, actionable overview to guide future development of ageing-aware, self-monitoring embedded systems with practical implications for reliability in harsh or long-duration deployments.

Abstract

Embedded digital devices are progressively deployed in dependable or safety-critical systems. These devices undergo significant hardware ageing, particularly in harsh environments. This increases their likelihood of failure. It is crucial to understand ageing processes and to detect hardware degradation early for guaranteeing system dependability. In this survey, we review the core ageing mechanisms, identify and categorize general working principles of ageing detection and monitoring techniques for Commercial-Off-The-Shelf (COTS) components that are prevalent in embedded systems: Field Programmable Gate Arrays (FPGAs), microcontrollers, System-on-Chips (SoCs), and their power supplies. From our review, we find that online techniques are more widely applied on FPGAs than on other components, and see a rising trend towards machine learning application for analysing hardware ageing. Based on the reviewed literature, we identify research opportunities and potential directions of interest in the field. With this work, we intend to facilitate future research by systematically presenting all main approaches in a concise way.
Paper Structure (41 sections, 10 equations, 17 figures)

This paper contains 41 sections, 10 equations, 17 figures.

Figures (17)

  • Figure 1: Ageing monitoring is required to ensure the reliability of embedded systems, which are affected by environmental and operational conditions. This survey covers FPGAs, Microcontrollers, and SoCs together with their power supplies as the prevalent system components.
  • Figure 2: Bathtub distribution illustrating the typical evolution of the component failure rates over time rmr-s.
  • Figure 3: Cross-section of a transistor with the gate oxide traversing all three stages of dielectric breakdown.
  • Figure 4: Illustration of the effect that fixed oxide trapped charges have on the drain current ($I_d$) vs. gate-source voltage ($V_{gs}$) characteristic for N-MOS and P-MOS devices tidc-b.
  • Figure 5: Test types organized according to their highest testing frequency. We have expanded the classification system proposed by Kochte et al.stsa-kw, which focuses on self-testing systems, in order to apply it to ageing detection techniques.
  • ...and 12 more figures