Table of Contents
Fetching ...

Unitary Multi-Margin BERT for Robust Natural Language Processing

Hao-Yuan Chang, Kang L. Wang

TL;DR

A novel, universal technique is reported that drastically improves the robustness of Bidirectional Encoder Representations from Transformers (BERT) by combining the unitary weights with the multi-margin loss, and amplifies the protection against malicious interference.

Abstract

Recent developments in adversarial attacks on deep learning leave many mission-critical natural language processing (NLP) systems at risk of exploitation. To address the lack of computationally efficient adversarial defense methods, this paper reports a novel, universal technique that drastically improves the robustness of Bidirectional Encoder Representations from Transformers (BERT) by combining the unitary weights with the multi-margin loss. We discover that the marriage of these two simple ideas amplifies the protection against malicious interference. Our model, the unitary multi-margin BERT (UniBERT), boosts post-attack classification accuracies significantly by 5.3% to 73.8% while maintaining competitive pre-attack accuracies. Furthermore, the pre-attack and post-attack accuracy tradeoff can be adjusted via a single scalar parameter to best fit the design requirements for the target applications.

Unitary Multi-Margin BERT for Robust Natural Language Processing

TL;DR

A novel, universal technique is reported that drastically improves the robustness of Bidirectional Encoder Representations from Transformers (BERT) by combining the unitary weights with the multi-margin loss, and amplifies the protection against malicious interference.

Abstract

Recent developments in adversarial attacks on deep learning leave many mission-critical natural language processing (NLP) systems at risk of exploitation. To address the lack of computationally efficient adversarial defense methods, this paper reports a novel, universal technique that drastically improves the robustness of Bidirectional Encoder Representations from Transformers (BERT) by combining the unitary weights with the multi-margin loss. We discover that the marriage of these two simple ideas amplifies the protection against malicious interference. Our model, the unitary multi-margin BERT (UniBERT), boosts post-attack classification accuracies significantly by 5.3% to 73.8% while maintaining competitive pre-attack accuracies. Furthermore, the pre-attack and post-attack accuracy tradeoff can be adjusted via a single scalar parameter to best fit the design requirements for the target applications.

Paper Structure

This paper contains 19 sections, 1 theorem, 8 equations, 3 figures, 5 tables.

Key Result

Theorem 1

A unitary matrix ($\boldsymbol{U}$) maintains the Euclidean distance between the original ($\overline{x}$) and the perturbed vector ($\overline{x}^{'}$) after the linear transformation.

Figures (3)

  • Figure 1: Multi-margin loss for binary sentiment analysis with UniBERT. On the left, UniBERT receives an input sentence $\overline{x}$, transforming it with the embedding and the 12 attention layers (abbreviated as … in the figure) to latent neural representations, ${\overline{x}}_1\dots {\overline{x}}_{12}$. ${\overline{x}}_{12}$ passes through the classifier and the projection layers to become logits $\overline{y}$, consisting of $y_1$ and $y_0$ (scalars) in the binary classification example shown here. If $y_1>y_0$, the network will predict that the sentence has a positive sentiment (class 1); otherwise, it predicts a negative sentiment (class 0). The multi-margin loss compares $\overline{y}$ with the correct answer labeled by humans (denoted by the checkmark in the figure) and penalizes the network for any insufficient distinction between the two logits (i.e., $y_1-y_0$. We name this difference "logit dissimilarity," $\delta$. Additionally, we define a margin parameter $\varepsilon$ such that the multi-margin loss is proportional to the lack of the desired margin when $\delta$$\mathrm{<}$$\varepsilon$ as shown in the figure by the line with a slope of -1. If the network has sufficient margin, the loss is zero (i.e., the flat segment on the right). During training, our UniBERT adjusts its weights from all the layers to minimize loss; training progression is depicted by the square, triangular, and circular markers on the graph. With a larger $\varepsilon$, the multi-margin loss encourages our UniBERT to have highly distinctive logits, making it more difficult for the attackers to move them across the decision boundary with small perturbations and to sabotage the prediction results.
  • Figure 2: Our unitary multi-margin BERT (UniBERT) architecture. Like BERT, UniBERT classifies a sentence by first converting words in a sentence vector using the Word, Position, and Token embeddings (top-left). It then transforms this sentence vector using 12 attention layers with details shown on the right of this figure. Our UniBERT is a variant of BERT with the following differences: First, we use the multi-margin loss (bottom) instead of the cross-entropy loss during the finetuning portion of the training process. Second, we enforce unitary constraints on the weights circled with dashed lines (top-right). These enhancements increase the resilience to input perturbations for stabilizing the classification outcomes under adversarial attacks. ${\overline{x}}_1\dots {\overline{x}}_{12}$ denotes the activations after each of the 12 unit blocks, respectively (left). Emb means an embedding layer while Linear means a fully-connected layer. Tanh is the hyperbolic tangent; GELU is the Gaussian Error Linear Unit. They are nonlinearities similar to the ReLU.
  • Figure 3: Cosine similarity between the original and the perturbed activations for different neural networks. Adversaries attack the neural networks by perturbing the input sentence. We measure the cosine similarity between the activations of the original sentence and the activations of the perturbed sentence in UniBERT at the output of each attention layer (i.e., ${\overline{x}}_1\dots {\overline{x}}_{12}$ in Fig. 2) indexed from 1 to 12 across the neural network (dotted line). The same is done for BERT (dashed line). We perform statistical analysis on 1000 randomly chosen sentences from the ag_news dataset and perturb them with the Textfooler attack recipe; the error bars denote standard deviation. As defined in \ref{['GrindEQ__9_']}, cosine similarity ($\mathrm{\in }$$\mathrm{\mathbb{R}}$$\mathrm{\cap}$[-1,1]) is a distance measure for quantifying how well two vectors align with each other in a vector space with one indicating that the two vectors are identical. For UniBERT, the representations become more alike in deeper layers; on the contrary, the similarity fluctuates in BERT. A higher cosine similarity score means that the network is capable of restraining perturbations closer to the original neural representation; consequently, higher post-attack accuracies (see the text added in the figure).

Theorems & Definitions (1)

  • Theorem 1