Use of Multi-CNNs for Section Analysis in Static Malware Detection
Tony Quertier, Grégoire Barrué
TL;DR
The paper tackles static malware detection with explainability by analyzing Portable Executable files at the section level. It introduces a distributed framework that converts each PE section into a $64×64$ grayscale image, trains a CNN per section, and learns a final classification score from the per-section outputs using XGBoost or Random Forest. The approach yields a modest accuracy gain of about $1.5\%$ over a single full-file CNN and provides interpretable insights into which sections drive decisions via permutation importance and Mean Decrease in Impurity analyses, notably highlighting the .idata and .rsrc sections. This method offers actionable, scalable guidance for analysts and can be extended by adding more PE sections to further improve performance and explainability.
Abstract
Existing research on malware detection focuses almost exclusively on the detection rate. However, in some cases, it is also important to understand the results of our algorithm, or to obtain more information, such as where to investigate in the file for an analyst. In this aim, we propose a new model to analyze Portable Executable files. Our method consists in splitting the files in different sections, then transform each section into an image, in order to train convolutional neural networks to treat specifically each identified section. Then we use all these scores returned by CNNs to compute a final detection score, using models that enable us to improve our analysis of the importance of each section in the final score.
