Table of Contents
Fetching ...

DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification

Saifullah Saifullah, Stefan Agne, Andreas Dengel, Sheraz Ahmed

TL;DR

DocXplain introduces a novel model-agnostic explainability framework for document image classification that yields fine-grained per-pixel attributions by separately segmenting foreground and background features and applying controlled feature ablation. The method combines multi-scale segmentation kernels with normalization and aggregation across masks to produce attribution maps, enabling decoupled interpretation of region-level vs. foreground content. Across RVL-CDIP and Tobacco3482, DocXplain consistently improves faithfulness and interpretability relative to nine baselines, as evidenced by AOPC, ABPC, and related metrics, with FG+BG offering particularly detailed explanations. This work advances transparency, fairness, and robustness in document analysis and opens avenues for OCR integration and multimodal extensions.

Abstract

Deep learning (DL) has revolutionized the field of document image analysis, showcasing superhuman performance across a diverse set of tasks. However, the inherent black-box nature of deep learning models still presents a significant challenge to their safe and robust deployment in industry. Regrettably, while a plethora of research has been dedicated in recent years to the development of DL-powered document analysis systems, research addressing their transparency aspects has been relatively scarce. In this paper, we aim to bridge this research gap by introducing DocXplain, a novel model-agnostic explainability method specifically designed for generating high interpretability feature attribution maps for the task of document image classification. In particular, our approach involves independently segmenting the foreground and background features of the documents into different document elements and then ablating these elements to assign feature importance. We extensively evaluate our proposed approach in the context of document image classification, utilizing 4 different evaluation metrics, 2 widely recognized document benchmark datasets, and 10 state-of-the-art document image classification models. By conducting a thorough quantitative and qualitative analysis against 9 existing state-of-the-art attribution methods, we demonstrate the superiority of our approach in terms of both faithfulness and interpretability. To the best of the authors' knowledge, this work presents the first model-agnostic attribution-based explainability method specifically tailored for document images. We anticipate that our work will significantly contribute to advancing research on transparency, fairness, and robustness of document image classification models.

DocXplain: A Novel Model-Agnostic Explainability Method for Document Image Classification

TL;DR

DocXplain introduces a novel model-agnostic explainability framework for document image classification that yields fine-grained per-pixel attributions by separately segmenting foreground and background features and applying controlled feature ablation. The method combines multi-scale segmentation kernels with normalization and aggregation across masks to produce attribution maps, enabling decoupled interpretation of region-level vs. foreground content. Across RVL-CDIP and Tobacco3482, DocXplain consistently improves faithfulness and interpretability relative to nine baselines, as evidenced by AOPC, ABPC, and related metrics, with FG+BG offering particularly detailed explanations. This work advances transparency, fairness, and robustness in document analysis and opens avenues for OCR integration and multimodal extensions.

Abstract

Deep learning (DL) has revolutionized the field of document image analysis, showcasing superhuman performance across a diverse set of tasks. However, the inherent black-box nature of deep learning models still presents a significant challenge to their safe and robust deployment in industry. Regrettably, while a plethora of research has been dedicated in recent years to the development of DL-powered document analysis systems, research addressing their transparency aspects has been relatively scarce. In this paper, we aim to bridge this research gap by introducing DocXplain, a novel model-agnostic explainability method specifically designed for generating high interpretability feature attribution maps for the task of document image classification. In particular, our approach involves independently segmenting the foreground and background features of the documents into different document elements and then ablating these elements to assign feature importance. We extensively evaluate our proposed approach in the context of document image classification, utilizing 4 different evaluation metrics, 2 widely recognized document benchmark datasets, and 10 state-of-the-art document image classification models. By conducting a thorough quantitative and qualitative analysis against 9 existing state-of-the-art attribution methods, we demonstrate the superiority of our approach in terms of both faithfulness and interpretability. To the best of the authors' knowledge, this work presents the first model-agnostic attribution-based explainability method specifically tailored for document images. We anticipate that our work will significantly contribute to advancing research on transparency, fairness, and robustness of document image classification models.
Paper Structure (30 sections, 6 equations, 10 figures, 2 tables)

This paper contains 30 sections, 6 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: An overview of the proposed approach, DocXplain, is presented. As shown, the image first undergoes several processing steps for generating segmentation maps at different kernel sizes. Then, given any black-box neural network $f$, per-kernel feature importance maps are generated by applying feature ablation in combination with the respective segmentation maps. Subsequently, the maps are summed to simultaneously create decoupled attribution maps for both background and foreground regions. $\mathbf{S}_{FG}$ and $\mathbf{S}_{FG+BG}$ correspond to the DocXplainFG DocXplainFG+BG settings, respectively.
  • Figure 2: Segmentation maps generated by our approach for the kernels $k_1\times k_1=5 \times 5$, $k_{2x}\times k_{2y}= 3 \times 15$, and $k_{3x}\times k_{3y} = 15 \times 3$ on a few randomly selected samples from the Tobacco3482 document dataset.
  • Figure 3: Explanations generated by different attribution methods for the ConvNeXt-B convnext model on 6 randomly selected samples from the RVL-CDIP dataset. As evident, our approach under both settings (DocXplainFG and DocXplainFG+BG) produces significantly fine-grained attribution maps compared to existing methods. In addition, examining the two settings in combination allows decoupling whether an entire region or only specific foreground regions in the image are considered important by the model, significantly improving the interpretability of attributions.
  • Figure 4: A comparison of AOPCMoRF and AOPCLeRF of different methods, plotted relative to the random baseline, across 5 selected deep neural networks. As evident from the steep rise and steep descent in the AOPCMoRF and AOPCLeRF curves of our approach under both settings, it demonstrates significantly high faithfulness compared to other methods.
  • Figure 5: The results of 4 metrics, Sensitivity inf-sens, Infidelity inf-sens, Continuity saifullah2022privacy, and ABPC eval-2 obtained by each method are shown for each model on the X-axis. It can be observed that our approach, under both settings, either outperforms or performs comparably to existing state-of-the-art attribution-based approaches on various explainability metrics.
  • ...and 5 more figures