Table of Contents
Fetching ...

Image classification network enhancement methods based on knowledge injection

Yishuang Tian, Ning Wang, Liang Zhang

TL;DR

The paper tackles the lack of interpretability in end-to-end image classifiers by injecting structured human knowledge into a multi-level hierarchy. It introduces three levels of knowledge embedding—class features, class relationships, and wide external features—and a knowledge-injection module to supervise training. A two-stage optimization (knowledge optimization with a cosine-similarity objective and standard cross-entropy classification) aligns features with prior knowledge and trains the final classifier head. Experiments on a knowledge-image dataset demonstrate accuracy gains across ResNet and ViT backbones and improved interpretability via Grad-CAM and hidden-layer explanations, suggesting practical benefits for trustworthy vision systems.

Abstract

The current deep neural network algorithm still stays in the end-to-end training supervision method like Image-Label pairs, which makes traditional algorithm is difficult to explain the reason for the results, and the prediction logic is difficult to understand and analyze. The current algorithm does not use the existing human knowledge information, which makes the model not in line with the human cognition model and makes the model not suitable for human use. In order to solve the above problems, the present invention provides a deep neural network training method based on the human knowledge, which uses the human cognition model to construct the deep neural network training model, and uses the existing human knowledge information to construct the deep neural network training model. This paper proposes a multi-level hierarchical deep learning algorithm, which is composed of multi-level hierarchical deep neural network architecture and multi-level hierarchical deep learning framework. The experimental results show that the proposed algorithm can effectively explain the hidden information of the neural network. The goal of our study is to improve the interpretability of deep neural networks (DNNs) by providing an analysis of the impact of knowledge injection on the classification task. We constructed a knowledge injection dataset with matching knowledge data and image classification data. The knowledge injection dataset is the benchmark dataset for the experiments in the paper. Our model expresses the improvement in interpretability and classification task performance of hidden layers at different scales.

Image classification network enhancement methods based on knowledge injection

TL;DR

The paper tackles the lack of interpretability in end-to-end image classifiers by injecting structured human knowledge into a multi-level hierarchy. It introduces three levels of knowledge embedding—class features, class relationships, and wide external features—and a knowledge-injection module to supervise training. A two-stage optimization (knowledge optimization with a cosine-similarity objective and standard cross-entropy classification) aligns features with prior knowledge and trains the final classifier head. Experiments on a knowledge-image dataset demonstrate accuracy gains across ResNet and ViT backbones and improved interpretability via Grad-CAM and hidden-layer explanations, suggesting practical benefits for trustworthy vision systems.

Abstract

The current deep neural network algorithm still stays in the end-to-end training supervision method like Image-Label pairs, which makes traditional algorithm is difficult to explain the reason for the results, and the prediction logic is difficult to understand and analyze. The current algorithm does not use the existing human knowledge information, which makes the model not in line with the human cognition model and makes the model not suitable for human use. In order to solve the above problems, the present invention provides a deep neural network training method based on the human knowledge, which uses the human cognition model to construct the deep neural network training model, and uses the existing human knowledge information to construct the deep neural network training model. This paper proposes a multi-level hierarchical deep learning algorithm, which is composed of multi-level hierarchical deep neural network architecture and multi-level hierarchical deep learning framework. The experimental results show that the proposed algorithm can effectively explain the hidden information of the neural network. The goal of our study is to improve the interpretability of deep neural networks (DNNs) by providing an analysis of the impact of knowledge injection on the classification task. We constructed a knowledge injection dataset with matching knowledge data and image classification data. The knowledge injection dataset is the benchmark dataset for the experiments in the paper. Our model expresses the improvement in interpretability and classification task performance of hidden layers at different scales.
Paper Structure (22 sections, 3 figures, 5 tables, 3 algorithms)

This paper contains 22 sections, 3 figures, 5 tables, 3 algorithms.

Figures (3)

  • Figure 1: Traditional neural networks due to its black box characteristic cannot express the logical behavior of the internal hidden layer of the model, using knowledge injection can optimize the disadvantages of traditional neural networks in poor interpretability of hidden layers.
  • Figure 2: We introduce multi-level knowledge data to enrich the semantic part of the model and the spatial understanding ability of the model, respectively. We use three-way feature of category, 1) category-level feature, 2) category part relationship feature and 3) category relationship feature to add different level of knowledge information to the knowledge loss during training, and optimize the knowledge loss.
  • Figure 3: The model consists of three main modules, namely the feature extraction layer, the mix-grained infuse layer, and the classification head module. 1) The feature extraction layer is composed of different ResNet, ViT, etc. modules to adapt to different backbone networks; 2) The mixing knowledge injection layer is composed of activation function layers and fully connected layers; 3) The classification head module is composed of a fully connected layer.