The Compute Divide in Machine Learning: A Threat to Academic Contribution and Scrutiny?
Tamay Besiroglu, Sage Andrus Bergerson, Amelia Michael, Lennart Heim, Xueyun Luo, Neil Thompson
TL;DR
The paper investigates the compute divide between industry and academia in ML research and its governance implications. It leverages a data-driven survey of publications and analyses of model compute to show that academia has become increasingly underrepresented in compute-intensive research, especially large self-supervised models, while industry drives most of this work and diffusion of industry-developed open models increases. It argues that reduced academic scrutiny and diffusion could undermine understanding of high-impact models, and proposes policy responses centered on privileged structured access, centralized compute provisioning, and third-party auditing to maintain accountability. The suggested measures aim to bolster interpretability, safety, and open science, ensuring diverse contributions and external evaluation even as compute resources concentrate in industry.
Abstract
There are pronounced differences in the extent to which industrial and academic AI labs use computing resources. We provide a data-driven survey of the role of the compute divide in shaping machine learning research. We show that a compute divide has coincided with a reduced representation of academic-only research teams in compute intensive research topics, especially foundation models. We argue that, academia will likely play a smaller role in advancing the associated techniques, providing critical evaluation and scrutiny, and in the diffusion of such models. Concurrent with this change in research focus, there is a noticeable shift in academic research towards embracing open source, pre-trained models developed within the industry. To address the challenges arising from this trend, especially reduced scrutiny of influential models, we recommend approaches aimed at thoughtfully expanding academic insights. Nationally-sponsored computing infrastructure coupled with open science initiatives could judiciously boost academic compute access, prioritizing research on interpretability, safety and security. Structured access programs and third-party auditing may also allow measured external evaluation of industry systems.
