Table of Contents
Fetching ...

Hadamard product in deep learning: Introduction, Advances and Challenges

Grigorios G Chrysos, Yongtao Wu, Razvan Pascanu, Philip Torr, Volkan Cevher

TL;DR

The survey reframes the Hadamard product as a core architectural primitive in deep learning, highlighting its linear-cost capability to model nonlinear interactions and its potential to complement or replace heavier operators like self-attention in resource-constrained settings. It organizes existing work into four main domains—high-order interactions, multimodal fusion, adaptive modulation, and efficient pairwise operations—and connects them through a unifying view grounded in polynomial networks, gating, and masking. The authors also synthesize theoretical perspectives on expressivity, spectral bias, generalization, and robustness, alongside practical implementations and open problems, underscoring the Hadamard product’s broad applicability from edge devices to large language models. Overall, the paper argues that Hadamard-product-based primitives offer compelling trade-offs between efficiency and representational power, motivating future architectural innovations and cross-domain research.

Abstract

While convolution and self-attention mechanisms have dominated architectural design in deep learning, this survey examines a fundamental yet understudied primitive: the Hadamard product. Despite its widespread implementation across various applications, the Hadamard product has not been systematically analyzed as a core architectural primitive. We present the first comprehensive taxonomy of its applications in deep learning, identifying four principal domains: higher-order correlation, multimodal data fusion, dynamic representation modulation, and efficient pairwise operations. The Hadamard product's ability to model nonlinear interactions with linear computational complexity makes it particularly valuable for resource-constrained deployments and edge computing scenarios. We demonstrate its natural applicability in multimodal fusion tasks, such as visual question answering, and its effectiveness in representation masking for applications including image inpainting and pruning. This systematic review not only consolidates existing knowledge about the Hadamard product's role in deep learning architectures but also establishes a foundation for future architectural innovations. Our analysis reveals the Hadamard product as a versatile primitive that offers compelling trade-offs between computational efficiency and representational power, positioning it as a crucial component in the deep learning toolkit.

Hadamard product in deep learning: Introduction, Advances and Challenges

TL;DR

The survey reframes the Hadamard product as a core architectural primitive in deep learning, highlighting its linear-cost capability to model nonlinear interactions and its potential to complement or replace heavier operators like self-attention in resource-constrained settings. It organizes existing work into four main domains—high-order interactions, multimodal fusion, adaptive modulation, and efficient pairwise operations—and connects them through a unifying view grounded in polynomial networks, gating, and masking. The authors also synthesize theoretical perspectives on expressivity, spectral bias, generalization, and robustness, alongside practical implementations and open problems, underscoring the Hadamard product’s broad applicability from edge devices to large language models. Overall, the paper argues that Hadamard-product-based primitives offer compelling trade-offs between efficiency and representational power, motivating future architectural innovations and cross-domain research.

Abstract

While convolution and self-attention mechanisms have dominated architectural design in deep learning, this survey examines a fundamental yet understudied primitive: the Hadamard product. Despite its widespread implementation across various applications, the Hadamard product has not been systematically analyzed as a core architectural primitive. We present the first comprehensive taxonomy of its applications in deep learning, identifying four principal domains: higher-order correlation, multimodal data fusion, dynamic representation modulation, and efficient pairwise operations. The Hadamard product's ability to model nonlinear interactions with linear computational complexity makes it particularly valuable for resource-constrained deployments and edge computing scenarios. We demonstrate its natural applicability in multimodal fusion tasks, such as visual question answering, and its effectiveness in representation masking for applications including image inpainting and pruning. This systematic review not only consolidates existing knowledge about the Hadamard product's role in deep learning architectures but also establishes a foundation for future architectural innovations. Our analysis reveals the Hadamard product as a versatile primitive that offers compelling trade-offs between computational efficiency and representational power, positioning it as a crucial component in the deep learning toolkit.

Paper Structure

This paper contains 20 sections, 5 theorems, 21 equations, 4 figures, 4 tables.

Key Result

Theorem 1

Suppose that $\|\bm{z}_j\|_\infty \leq 1$ for all $j=1, \ldots, |\mathcal{Z}|$. Define the matrix $\Phi \coloneqq (\bm{A}_{[{N}]}\bullet \bm{S}_{[{N}]}) \prod_{i=1}^{N-1} \bm{I} \otimes \bm{A}_{[{i}]} \bullet \bm{S}_{[{i}]}$ where $\bullet$ symbolizes the face-splitting product (which can be though

Figures (4)

  • Figure 1: Six core areas where the Hadamard product has been widely used in deep learning era. (a) High-order correlations between the input elements are captured. Those correlations can augment the linear interactions of the typical layers, e.g., dense or convolutional layers. (b) As humans we generally perceive the world through different senses, which often offer complementary information. Similarly, machine learning (ML) models can extract complementary information from different sources and then meld them together to make an informed decision. (c) During the pre-training of language modeling, we mask attention to the key of future tokens for each query so that the model does not use information from the next token when predicting the next token. (d) Hard masking and soft masking via Hadamard product for the image in the input space. (e) The Hadamard product has recently been used as an alternative operator to the matrix multiplication, e.g., in order to accelerate the popular Self-Attention. (f) Weight pruning can be viewed as applying a Hadamard product to the original weights, effectively zeroing out certain parameters. Among those core areas, we identify four parent categories and links between them (e.g. weight pruning). To our knowledge, this taxonomy is novel and allows us to establish concrete connections between seemingly disparate works within the same category, such as masking for inpainting and causal language modeling. To facilitate further research, we have also compiled the diverse open-source links in \ref{['tab:hadamard_product_indicative_author_implementations']}.
  • Figure S2: Taxonomy of Hadamard product in deep learning. The category of high-order interactions is often divided by the degree of interactions, with a more fine-grained taxonomy being whether there is parameter-sharing, i.e., \ref{['eq:nosharing_model_no_sharing']} vs \ref{['eq:prodpoly_model2_simplified']}. Similarly, in multimodal fusion, the number of domains is a fundamental separation, with techniques such as VQA belonging to two domains. We believe in the following years, works will increasingly focus on multiple domains for general applications, which simulates how humans perceive and process multiple domains.
  • Figure S3: Runtime and Peak memory consumption performance comparison in vision between Poly-NL, which implements (a variant of) \ref{['eqn:fast_nl']}, and other non-local methods executed on a RTX2080 GPU. The network utilizing the Hadamard product exhibits lower computational overhead than competing methods, which is of importance with an increasing number of spatial positions or channels. The figure is reproduced from babiloni2023linear.
  • Figure S4: Runtime and Flops comparison in text domain between Poly-SA, which implements (a variant of) \ref{['eqn:fast_nl']}, and two other self-attention method executed on a RTX2080 GPU. The network utilizing the Hadamard product exhibits lower computational overhead than competing methods, with a complexity comparable to a linear layer using no attention mechanism. The figure is reproduced from babiloni2023linear.

Theorems & Definitions (8)

  • Definition 1.1: Hadamard product
  • Theorem 1: Theorem 3 of zhenyu2022controlling
  • Theorem 2: Theorem 4 of zhenyu2022controlling
  • Theorem 3: Theorem 4 of wu2022extrapolation
  • Definition S.1.1: mode-$m$ vector product
  • Definition S.1.2: CP decomposition
  • Lemma 1
  • Lemma 2: Lemma 2 in chrysos2019polygan