File Fragment Classification using Light-Weight Convolutional Neural Networks
Mustafa Ghaleb, Kunwar Saaim, Muhamad Felemban, Saleh Al-Saleh, Ahmad Al-Mulhem
TL;DR
This paper addresses the challenge of identifying file fragment types in digital forensics without relying on metadata. It introduces three light-weight CNN architectures based on depthwise separable convolutions (DSC, DSC-SE, M-DSC) that drastically reduce parameters while maintaining competitive accuracy for file fragment classification. Evaluated on the FFT-75 dataset, the models achieve about 79% accuracy with roughly 100K parameters and around 164 MFLOPs, outperforming FiFTy in inference speed while remaining competitive in accuracy. The results demonstrate the practicality of fast, resource-efficient on-device classification for large-scale forensic analysis, with potential for neural architecture search and distillation as future enhancements.
Abstract
In digital forensics, file fragment classification is an important step toward completing file carving process. There exist several techniques to identify the type of file fragments without relying on meta-data, such as using features like header/footer and N-gram to identify the fragment type. Recently, convolutional neural network (CNN) models have been used to build classification models to achieve this task. However, the number of parameters in CNNs tends to grow exponentially as the number of layers increases. This results in a dramatic increase in training and inference time. In this paper, we propose light-weight file fragment classification models based on depthwise separable CNNs. The evaluation results show that our proposed models provide faster inference time with comparable accuracy as compared to the state-of-art CNN based models. In particular, our models were able to achieve an accuracy of 79\% on the FFT-75 dataset with nearly 100K parameters and 164M FLOPs, which is 4x smaller and 6x faster than the state-of-the-art classifier in the literature.
