Bridging the Semantic Gap in Virtual Machine Introspection and Forensic Memory Analysis
Christofer Fellicious, Hans P. Reiser, Michael Granitzer
TL;DR
This work tackles the semantic gap in Virtual Machine Introspection and Forensic Memory Analysis by introducing metadata-driven feature engineering and graph-based representations to auto-reconstruct high-level memory structures from raw memory. It validates the approach with OpenSSH as a controlled use-case and extends to full VM memory dumps, showing that leveraging metadata yields substantial performance gains, particularly with limited training data. The authors present multiple methods (MetaKex, HeaderKex, GraphKex, SlicedKex) and demonstrate that GraphKex achieves the strongest accuracy while offering favorable training efficiency. A key contribution is an open dataset totaling over $1.5$ TB of memory captures across OS versions, enabling reproducibility and broader benchmarking for VMI/FMA research. Overall, the results underscore that targeted feature engineering and model design can effectively bridge the semantic gap and aid forensic analysts in memory investigations.
Abstract
Forensic Memory Analysis (FMA) and Virtual Machine Introspection (VMI) are critical tools for security in a virtualization-based approach. VMI and FMA involves using digital forensic methods to extract information from the system to identify and explain security incidents. A key challenge in both FMA and VMI is the "Semantic Gap", which is the difficulty of interpreting raw memory data without specialized tools and expertise. In this work, we investigate how a priori knowledge, metadata and engineered features can aid VMI and FMA, leveraging machine learning to automate information extraction and reduce the workload of forensic investigators. We choose OpenSSH as our use case to test different methods to extract high level structures. We also test our method on complete physical memory dumps to showcase the effectiveness of the engineered features. Our features range from basic statistical features to advanced graph-based representations using malloc headers and pointer translations. The training and testing are carried out on public datasets that we compare against already recognized baseline methods. We show that using metadata, we can improve the performance of the algorithm when there is very little training data and also quantify how having more data results in better generalization performance. The final contribution is an open dataset of physical memory dumps, totalling more than 1 TB of different memory state, software environments, main memory capacities and operating system versions. Our methods show that having more metadata boosts performance with all methods obtaining an F1-Score of over 80%. Our research underscores the possibility of using feature engineering and machine learning techniques to bridge the semantic gap.
