Hyperbolic Binary Neural Network

Jun Chen; Jingyang Xiang; Tianxin Huang; Xiangrui Zhao; Yong Liu

Hyperbolic Binary Neural Network

Jun Chen, Jingyang Xiang, Tianxin Huang, Xiangrui Zhao, Yong Liu

TL;DR

Binary neural networks offer efficiency but suffer from constrained optimization in the binarized space. The authors propose a Hyperbolic Binary Neural Network (HBNN) that maps the unconstrained weight vector $\tilde{\mathbf{w}}$ into hyperbolic space via the exponential parametrization cluster $\phi_{\mathcal{F}}(\tilde{\mathbf{w}})$ within the Poincaré ball $\mathbb{D}_{r}^{n}$, converting the problem into an unconstrained Euclidean optimization. The Exponential Parametrization Cluster (EPC) trains a cluster of points to increase weight flips and information gain, with a theoretical diffeomorphism property that improves exploration over a single exponential map. Empirically, HBNN achieves state-of-the-art results on CIFAR10/100 and ImageNet with VGGsmall, ResNet18, and ResNet34, while maintaining inference efficiency comparable to conventional BNNs.

Abstract

Binary Neural Network (BNN) converts full-precision weights and activations into their extreme 1-bit counterparts, making it particularly suitable for deployment on lightweight mobile devices. While binary neural networks are typically formulated as a constrained optimization problem and optimized in the binarized space, general neural networks are formulated as an unconstrained optimization problem and optimized in the continuous space. This paper introduces the Hyperbolic Binary Neural Network (HBNN) by leveraging the framework of hyperbolic geometry to optimize the constrained problem. Specifically, we transform the constrained problem in hyperbolic space into an unconstrained one in Euclidean space using the Riemannian exponential map. On the other hand, we also propose the Exponential Parametrization Cluster (EPC) method, which, compared to the Riemannian exponential map, shrinks the segment domain based on a diffeomorphism. This approach increases the probability of weight flips, thereby maximizing the information gain in BNNs. Experimental results on CIFAR10, CIFAR100, and ImageNet classification datasets with VGGsmall, ResNet18, and ResNet34 models illustrate the superior performance of our HBNN over state-of-the-art methods.

Hyperbolic Binary Neural Network

TL;DR

into hyperbolic space via the exponential parametrization cluster

within the Poincaré ball

, converting the problem into an unconstrained Euclidean optimization. The Exponential Parametrization Cluster (EPC) trains a cluster of points to increase weight flips and information gain, with a theoretical diffeomorphism property that improves exploration over a single exponential map. Empirically, HBNN achieves state-of-the-art results on CIFAR10/100 and ImageNet with VGGsmall, ResNet18, and ResNet34, while maintaining inference efficiency comparable to conventional BNNs.

Abstract

Paper Structure (19 sections, 15 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 15 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Preliminaries
Riemannian Geometry
Binary Neural Network
Hyperbolic Binary Neural Network
The Poincaré Ball
Exponential Parametrization Cluster (EPC)
Backward Mode and Gradient Computation
Method Analysis
Theoretical Analysis
Method Comparison and Explanation
Experiments
Ablation Study
Comparison to State-of-the-art Methods
...and 4 more sections

Figures (6)

Figure 1: The exponential parametrization cluster $\phi_{\mathcal{F}}$ transforms a vector $v$ into the mapped cluster $\phi_{\mathcal{F}}(v)$ using an original cluster $\mathcal{F}=\{\mathcal{F}_1,\mathcal{F}_2,\cdots,\mathcal{F}_t\}$, where $\mathcal{F}$ and $\phi_{\mathcal{F}}(v)$ exist in hyperbolic space, while $v$ resides in Euclidean space. In contrast, the Riemannian exponential map $\exp$ transforms a vector $v$ into the mapped point $\exp(v)$.
Figure 2: The overview of our HBNN with the EPC. By training an original cluster $\mathcal{F}=\{\mathcal{F}_1,\mathcal{F}_2,\cdots,\mathcal{F}_t\}$, we map a weight vector $\tilde{\mathbf{w}}$ into the mapped cluster $\phi_{\mathcal{F}}(\tilde{\mathbf{w}})=\{\phi_{\mathcal{F}_1}(\tilde{\mathbf{w}}),\phi_{\mathcal{F}_2}(\tilde{\mathbf{w}}),\cdots,\phi_{\mathcal{F}_t}(\tilde{\mathbf{w}})\}$. Subsequently, we obtain an optimal exponential parametrization (Let's assume $\phi_{\mathcal{F}_i}(\cdot)$) based on the mapped cluster. Consequently, we continue to optimize the weight vector $\tilde{\mathbf{w}}$ via $\phi_{\mathcal{F}_i}(\cdot)$. Note that HBNN obtains the binarized weight vector via $\operatorname{sign}(\phi_{\mathcal{F}_i}(\tilde{\mathbf{w}}))$.
Figure 3: Weight flip rates of our HBNN and XNOR++ in different layers of ResNet18.
Figure 4: Validation accuracy curves of our HBNN, RBNN, and ReCU on CIFAR10 dataset with VGGsmall.
Figure 5: 2D visualization of the loss surfaces of ResNet18 on CIFAR10 dataset enables comparisons of the sharpness/flatness of different methods. The sharpness of loss surfaces is indicated by the accompanying numbers, with the yellow area representing particularly large peaks. In comparison to XNOR++, HBNN exhibits flatter loss surfaces.
...and 1 more figures

Hyperbolic Binary Neural Network

TL;DR

Abstract

Hyperbolic Binary Neural Network

Authors

TL;DR

Abstract

Table of Contents

Figures (6)