We present two simple ways of reducing the number of parameters and
accelerating the training of large Long Short-Term Memory (LSTM) networks: the
first one is "matrix factorization by design" of LSTM matrix into the product
of two smaller matrices, and the second one is partitioning of LSTM matrix, its
inputs and states into the independent groups. Both approaches allow us to
train large LSTM networks significantly faster to the near state-of the art
perplexity while using significantly less RNN parameters.