Tensor Polynomial Additive Model

Yang Chen; Ce Zhu; Jiani Liu; Yipeng Liu

Tensor Polynomial Additive Model

Yang Chen, Ce Zhu, Jiani Liu, Yipeng Liu

TL;DR

The paper introduces TPAM, a tensor-based polynomial additive model that preserves high-order tensor structure to improve accuracy while enabling parameter-efficient learning via hierarchical low-order tensor decompositions. It maintains transparency through self-explanation and extends to post-hoc interpretation by integrating TPAM into CAM, yielding P-CAM and PI-CAM with finer-grained saliency maps. Empirical results on MNIST, CIFAR-10, STL-10 demonstrate strong accuracy gains (up to 30%) and significant compression (up to 5×) compared with interpretable baselines, while providing interpretable feature importances. Overall, TPAM offers a scalable, interpretable, and practical framework for high-order tensor inputs and interpretable visual explanations.

Abstract

Additive models can be used for interpretable machine learning for their clarity and simplicity. However, In the classical models for high-order data, the vectorization operation disrupts the data structure, which may lead to degenerated accuracy and increased computational complexity. To deal with these problems, we propose the tensor polynomial addition model (TPAM). It retains the multidimensional structure information of high-order inputs with tensor representation. The model parameter compression is achieved using a hierarchical and low-order symmetric tensor approximation. In this way, complex high-order feature interactions can be captured with fewer parameters. Moreover, The TPAM preserves the inherent interpretability of additive models, facilitating transparent decision-making and the extraction of meaningful feature values. Additionally, leveraging TPAM's transparency and ability to handle higher-order features, it is used as a post-processing module for other interpretation models by introducing two variants for class activation maps. Experimental results on a series of datasets demonstrate that TPAM can enhance accuracy by up to 30\%, and compression rate by up to 5 times, while maintaining a good interpretability.

Tensor Polynomial Additive Model

TL;DR

Abstract

Paper Structure (31 sections, 17 equations, 10 figures, 7 tables)

This paper contains 31 sections, 17 equations, 10 figures, 7 tables.

Introduction
Notations and Related Works
Notations
Interpretable Additive Modeling Machine Learning
Saliency Map Generation
Polynomial Neural Network Learning
Methods
Scalable Polynomial Additive Models (SPAM)
Interpretable Tensor Polynomial Additive Model
T-TPAM
V-TPAM
PT-TPAM
MPT-TPAM
Learning and Discussion for TPAM
Initialization scheme
...and 16 more sections

Figures (10)

Figure 1: (a) Classification performance of the self-interpretation model on the MNIST dataset. TPAM has the highest performance with a low number of parameters. (b) Explanation of the generation of the TPAM. Red represents the positive contribution and blue represents the negative contribution, just like in subjective human judgment. (c) P-CAM and PI-CAM Performance. The resulting saliency maps are more fine-grained.
Figure 2: (a) For a $K$-order polynomial, the $K$ modules depicted in the figure correspond to different orders in the polynomial. The weight tensor $\mathbf{\mathcal{W}}$ is decomposed into a factor tensor $\mathbf{\mathcal{U}}$, which has the same size as the input $\mathbf{\mathcal{X}} \in \mathbb{R}^{I_1 \times \cdots \times I_N}$. Following inner-product and multiplication operations, $\mathbf{\mathcal{W}}$ and $\mathbf{\mathcal{U}}$ are then passed into the fully connected layer to obtain the final $C$ classification result. (b) The training process of V-TPAM remains unchanged, except for the further decomposition of $\mathbf{\mathcal{W}}$ into factor vectors $\mathbf{{u}}$. (c) PT-TPAM initially determines the patch size and shift step through patch slicing, and subsequently transfers this information to T-TPAM for further learning. (d) MPT-TPAM undergoes co-training with PT-TPAM, employing various patch sizes and shift steps. Using different patch sizes can further capture process and remote dependencies and increase the performance of the model.
Figure 3: (a) In P-CAM, the input $\mathbf{\mathcal{X}}$ undergoes processing through CNN model $f(\cdot)$, resulting in the acquisition of feature map $\mathbf{\mathcal{A}}_l$. Subsequently, the TPAM model classifies the feature map, and the trained TPAM is employed to extract the model's important tensor. The saliency map is then obtained by downscaling the important tensor using the function $s(\cdot)$. (b) PI-CAM acquires the feature map $\mathbf{\mathcal{A}}_l$ through the CNN model $f(\cdot)$. Feature map is then utilized as input to P-CAM after performing a dot product with the input $\mathbf{\mathcal{X}}$. Subsequently, P-CAM is trained to obtain the important map (the saliency map of each $\mathbf{{A}}_l^{i_c}$). The saliency map is derived by adding the important maps.
Figure 4: General process of self-interpretation in TPAM.
Figure 5: The effect of different TPAM models on model performance is tested on different datasets for Ranks. (a) MNIST. (b) CIFAR-10
...and 5 more figures

Tensor Polynomial Additive Model

TL;DR

Abstract

Tensor Polynomial Additive Model

Authors

TL;DR

Abstract

Table of Contents

Figures (10)