Krony-PT: GPT2 compressed with Kronecker Products
Mohamed Ayoub Ben Ayad, Jelena Mitrovic, Michael Granitzer
TL;DR
This work tackles efficient deployment of decoder-only LLMs by compressing GPT-2 using Kronecker-product factorizations on the FFN weight matrices, producing models in the $81M$–$96M$ range from the original $124M$. Krony-PT introduces two initialization strategies—adaptive normalization for the Van Loan decomposition and a pruning-based method—along with uniform layer compression across all 12 transformer blocks and weight tying of input/output embeddings. An $81M$ Krony-PT variant outperforms DistilGPT-2 on next-token prediction and is competitive with larger Kronecker-compressed GPT-2 models. The work demonstrates the viability of Kronecker-based factorization for efficient LLM deployment and outlines future directions such as faster Kronecker computations and improved interpretability of the factors.
Abstract
We introduce Krony-PT, a compression technique for GPT-2 based on Kronecker products. We specifically target the feed-forward weights of each transformer block, and systematically compress the feed-forward layer matrices to various degrees. We introduce a modified Van Loan decomposition to initialize new Kronecker factors, and also propose a new pruning-based initialization technique. Our method compresses the original 124M-parameter GPT-2 to various smaller models, ranging from 80M to 96M. Our 81M model variant outperforms DistilGPT2 on next-token prediction across all standard language modeling datasets, and shows competitive or comparable performance with significantly larger Kronecker-based compressions of GPT-2.
