Table of Contents
Fetching ...

Rewrite the Stars

Xu Ma, Xiyang Dai, Yue Bai, Yizhou Wang, Yun Fu

TL;DR

This work investigates why the star operation—element-wise feature multiplication—improves performance in neural networks. It demonstrates that star operations implicitly map inputs into a high-dimensional nonlinear feature space, akin to kernel tricks, enabling rich representations without widening the network. A simple proof-of-concept, StarNet, shows competitive ImageNet performance with very low latency on mobile and constrained hardware, validating the approach and highlighting its potential for efficient architectures. The study also provides extensive ablations, activation analyses, and open questions, suggesting a broad research direction toward activation-friendly, compact networks built around implicit high-dimensional feature spaces.

Abstract

Recent studies have drawn attention to the untapped potential of the "star operation" (element-wise multiplication) in network design. While intuitive explanations abound, the foundational rationale behind its application remains largely unexplored. Our study attempts to reveal the star operation's ability to map inputs into high-dimensional, non-linear feature spaces -- akin to kernel tricks -- without widening the network. We further introduce StarNet, a simple yet powerful prototype, demonstrating impressive performance and low latency under compact network structure and efficient budget. Like stars in the sky, the star operation appears unremarkable but holds a vast universe of potential. Our work encourages further exploration across tasks, with codes available at https://github.com/ma-xu/Rewrite-the-Stars.

Rewrite the Stars

TL;DR

This work investigates why the star operation—element-wise feature multiplication—improves performance in neural networks. It demonstrates that star operations implicitly map inputs into a high-dimensional nonlinear feature space, akin to kernel tricks, enabling rich representations without widening the network. A simple proof-of-concept, StarNet, shows competitive ImageNet performance with very low latency on mobile and constrained hardware, validating the approach and highlighting its potential for efficient architectures. The study also provides extensive ablations, activation analyses, and open questions, suggesting a broad research direction toward activation-friendly, compact networks built around implicit high-dimensional feature spaces.

Abstract

Recent studies have drawn attention to the untapped potential of the "star operation" (element-wise multiplication) in network design. While intuitive explanations abound, the foundational rationale behind its application remains largely unexplored. Our study attempts to reveal the star operation's ability to map inputs into high-dimensional, non-linear feature spaces -- akin to kernel tricks -- without widening the network. We further introduce StarNet, a simple yet powerful prototype, demonstrating impressive performance and low latency under compact network structure and efficient budget. Like stars in the sky, the star operation appears unremarkable but holds a vast universe of potential. Our work encourages further exploration across tasks, with codes available at https://github.com/ma-xu/Rewrite-the-Stars.
Paper Structure (40 sections, 3 equations, 9 figures, 19 tables, 2 algorithms)

This paper contains 40 sections, 3 equations, 9 figures, 19 tables, 2 algorithms.

Figures (9)

  • Figure 1: Illustration of the advantage of the star operation (element-wise multiplication). The left side depicts a basic building block abstracted from related works yang2022focalguo2023visualrao2022hornet, with " ?" representing either 'star' or 'summation.' The right side highlights the notable performance disparity between the two operations, with 'star' exhibiting superior performance, particularly with a narrower width. Please check Sec. \ref{['sec:Empirical_superiority_of_star_operation']} for more results.
  • Figure 2: Decision Boundary Comparison on 2D Noisy Moon Datasetscikit-learn. The star-based network exhibits a more effective decision boundary than summation under identical configurations. Relative to SVMs, the star operation's boundary closely aligns with that of a polynomial kernel SVM, differing from the Gaussian kernel SVM. More details are available in the Supplementary.
  • Figure 3: StarNet architecture overview. StarNet follows traditional hierarchical networks, and directly uses the convolutional layer to down-sample the resolution and double the channel number in each stage. We repeat multiple star blocks to extract features. Without any intricate structures and carefully chosen hyper-parameters, StarNet is able to deliver promising performance.
  • Figure 4: Mobile Device (iPhone13) Latency vs. ImageNet Accuracy. Models with excessively high latency are excluded from this figure. More results on different mobile devices can be found in supplementary Table \ref{['tab:more_iphone_latency']}.
  • Figure 5: 4-run results of sum and star operations.
  • ...and 4 more figures