Accelerating System-Level Debug Using Rule Learning and Subgroup Discovery Techniques
Zurab Khasidashvili
TL;DR
This paper presents a framework that reduces system-level debugging effort by integrating Subgroup Discovery ($SD$) and Rule Learning ($RL$) through Feature Range Analysis ($RA$) to construct an iterative Root-Causing Tree ($RCT$) from trace logs. By engineering rich, distributional and sequential features from events and flows, the method isolates subgroups of failures that share the same root cause, enabling automated hints and knowledge reuse via mined rules. The approach is demonstrated on Intel's Power Management $PkgC8$ flow, yielding high-precision root-causing hints and substantial gains in validator productivity, while remaining applicable to pre-silicon and post-silicon contexts and other complex hardware/software/firmware systems. The work also discusses quality metrics, diversification strategies, hierarchical data refinement, and avenues for future automation of sequential bug patterns and rule-based predictions across designs.
Abstract
We propose a root-causing procedure for accelerating system-level debug using rule-based techniques. We describe the procedure and how it provides high quality debug hints for reducing the debug effort. This includes the heuristics for engineering features from logs of many tests, and the data analytics techniques for generating powerful debug hints. As a case study, we used these techniques for root-causing failures of the Power Management (PM) design feature Package-C8 and showed their effectiveness. Furthermore, we propose an approach for mining the root-causing experience and results for reuse, to accelerate future debug activities and reduce dependency on validation experts. We believe that these techniques are beneficial also for other validation activities at different levels of abstraction, for complex hardware, software and firmware systems, both pre-silicon and post-silicon.
