Can persistent homology whiten Transformer-based black-box models? A case study on BERT compression
Luis Balderas, Miguel Lastra, José M. Benítez
TL;DR
The paper tackles the challenge of making Transformer-based BERT models both explainable and deployment-friendly by applying zero-dimensional persistent homology to neuron outputs. The authors introduce OBCE, which quantifies neuron importance via the merge radius $r_f$ from persistent diagrams and prunes units using percentile thresholds, yielding compressed models. Experiments on the GLUE benchmark show substantial parameter reductions (to $58.47\%$ for BERT Base and $52.3\%$ for BERT Large) with competitive or improved task performance, surpassing several prior compression methods. This work demonstrates that topological features can provide principled explainability and practical efficiency for large language models, enabling their use on resource-constrained devices.
Abstract
Large Language Models (LLMs) like BERT have gained significant prominence due to their remarkable performance in various natural language processing tasks. However, they come with substantial computational and memory costs. Additionally, they are essentially black-box models, challenging to explain and interpret. In this article, we propose Optimus BERT Compression and Explainability (OBCE), a methodology to bring explainability to BERT models using persistent homology, aiming to measure the importance of each neuron by studying the topological characteristics of their outputs. As a result, we can compress BERT significantly by reducing the number of parameters (58.47% of the original parameters for BERT Base, 52.3% for BERT Large). We evaluated our methodology on the standard GLUE Benchmark, comparing the results with state-of-the-art techniques and achieving outstanding results. Consequently, our methodology can "whiten" BERT models by providing explainability to its neurons and reducing the model's size, making it more suitable for deployment on resource-constrained devices.
