Table of Contents
Fetching ...

Neural Speed Reading via Skim-RNN

Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi

TL;DR

Skim-RNN introduces a per-token skim/read mechanism that uses a small RNN to update only a portion of the hidden state for unimportant inputs, while a full RNN processes important tokens. Trained with a differentiable Gumbel-softmax reparameterization and an auxiliary skim loss, the model achieves substantial FLOP reductions (up to 3x in classification and over 1.4x in QA) with maintained or improved accuracy. It preserves standard RNN interfaces, enabling easy replacement in existing models, and demonstrates CPU-friendly latency advantages over GPU baselines in several settings. The approach offers a tunable speed/accuracy trade-off at inference time and shows promise for efficient sequence modeling on both classification and question-answering tasks.

Abstract

Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens. Skim-RNN gives computational advantage over an RNN that always updates the entire hidden state. Skim-RNN uses the same input and output interfaces as a standard RNN and can be easily used instead of RNNs in existing models. In our experiments, we show that Skim-RNN can achieve significantly reduced computational cost without losing accuracy compared to standard RNNs across five different natural language tasks. In addition, we demonstrate that the trade-off between accuracy and speed of Skim-RNN can be dynamically controlled during inference time in a stable manner. Our analysis also shows that Skim-RNN running on a single CPU offers lower latency compared to standard RNNs on GPUs.

Neural Speed Reading via Skim-RNN

TL;DR

Skim-RNN introduces a per-token skim/read mechanism that uses a small RNN to update only a portion of the hidden state for unimportant inputs, while a full RNN processes important tokens. Trained with a differentiable Gumbel-softmax reparameterization and an auxiliary skim loss, the model achieves substantial FLOP reductions (up to 3x in classification and over 1.4x in QA) with maintained or improved accuracy. It preserves standard RNN interfaces, enabling easy replacement in existing models, and demonstrates CPU-friendly latency advantages over GPU baselines in several settings. The approach offers a tunable speed/accuracy trade-off at inference time and shows promise for efficient sequence modeling on both classification and question-answering tasks.

Abstract

Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens. Skim-RNN gives computational advantage over an RNN that always updates the entire hidden state. Skim-RNN uses the same input and output interfaces as a standard RNN and can be easily used instead of RNNs in existing models. In our experiments, we show that Skim-RNN can achieve significantly reduced computational cost without losing accuracy compared to standard RNNs across five different natural language tasks. In addition, we demonstrate that the trade-off between accuracy and speed of Skim-RNN can be dynamically controlled during inference time in a stable manner. Our analysis also shows that Skim-RNN running on a single CPU offers lower latency compared to standard RNNs on GPUs.

Paper Structure

This paper contains 21 sections, 8 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The schematic of Skim-RNN on a sample sentence from Stanford Sentiment Treebank: "intelligent and invigorating film". At time step 1, Skim-RNN makes the decision to read or skim${\bf x}_1$ by using Equation \ref{['eqn:choice']} on ${\bf h}_0$ and ${\bf x}_1$. Since 'intelligent' is an important word for sentiment, it decides to read (blue diamond) by obtaining a full-size hidden state with the big RNN and updating the entire previous hidden state. At time step 2, Skim-RNN decides to skim (empty diamond) the word 'and' by updating the first few dimensions of the hidden state using small RNN.
  • Figure 2: Analyzing the effect of small hidden state size, d' (left) and $\gamma$ (right) on skim rate; ($d=100$, $d'=10$, and $\gamma=0.02$ are default values).
  • Figure 3: Results on Stanford Question Answering Dataset (SQuAD), using LSTM+Attention (2 layers of LSTM, $d=100$, $d'=20$ by default) and BiDAF ($d=100$, $d'=50$ by default).
  • Figure 4: Skim rate of LSTMs in LSTM+Att model. Two layers of forward and backward LSTMs are shown (total count of 4), with $d=100, d'=20$.
  • Figure 5: F1 score of standard LSTM with varying configurations (Blue) and Skim LSTM with varying configurations (Red), both sorted together in ascending order by the inverse of Flop-R (Orange). $d=100$ by default. Numbers inside B refer to $d$, and numbers inside S refer to $d', \gamma$.
  • ...and 6 more figures