SE3D: A Framework For Saliency Method Evaluation In 3D Imaging
Mariusz Wiśniewski, Loris Giulivi, Giacomo Boracchi
TL;DR
This work tackles the lack of quantitative benchmarks for explaining 3D CNNs in medical and real-world imaging. It introduces SE3D, a framework built on modified 3D datasets (ShapeNet, ScanNet, BraTS) and novel WSOL/WSSS metrics to rigorously evaluate 3D saliency methods, including both 3D-specific approaches and 2D-extended methods. The empirical study finds that although 3D-specific saliency methods like Saliency Tubes and Respond-CAM often outperform 2D extensions, all methods show substantial localization gaps in 3D, with notable weaknesses on medical imaging data; this signals a clear need for new 3D explainability techniques. SE3D lays groundwork for safer deployment of 3D CNNs by providing standardized evaluation and a path for improving 3D explanations and WSOL/WSSS solutions on volumetric data.
Abstract
For more than a decade, deep learning models have been dominating in various 2D imaging tasks. Their application is now extending to 3D imaging, with 3D Convolutional Neural Networks (3D CNNs) being able to process LIDAR, MRI, and CT scans, with significant implications for fields such as autonomous driving and medical imaging. In these critical settings, explaining the model's decisions is fundamental. Despite recent advances in Explainable Artificial Intelligence, however, little effort has been devoted to explaining 3D CNNs, and many works explain these models via inadequate extensions of 2D saliency methods. A fundamental limitation to the development of 3D saliency methods is the lack of a benchmark to quantitatively assess these on 3D data. To address this issue, we propose SE3D: a framework for Saliency method Evaluation in 3D imaging. We propose modifications to ShapeNet, ScanNet, and BraTS datasets, and evaluation metrics to assess saliency methods for 3D CNNs. We evaluate both state-of-the-art saliency methods designed for 3D data and extensions of popular 2D saliency methods to 3D. Our experiments show that 3D saliency methods do not provide explanations of sufficient quality, and that there is margin for future improvements and safer applications of 3D CNNs in critical fields.
