QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Shvetank Prakash, Andrew Cheng, Jason Yik, Arya Tschand, Radhika Ghosal, Ikechukwu Uchendu, Jessica Quaye, Jeffrey Ma, Shreyas Grampurohit, Sofia Giannuzzi, Arnav Balyan, Fin Amin, Aadya Pipersenia, Yash Choudhary, Ankita Nayak, Amir Yazdanbakhsh, Vijay Janapa Reddi
TL;DR
QuArch introduces the first architecture-focused QA dataset, comprising $1{,}547$ expert-validated questions across 13 topics to evaluate domain knowledge in computer architecture. The study surveys SoTA language models, revealing a performance ceiling near $84\%$ and a $12\%$ gap for smaller open-source counterparts, with memory systems and interconnects as persistent weaknesses. It demonstrates QuArch's value as both a benchmark and a training resource, showing fine-tuning gains of $5.4\%$–$8.3\%$ for small models on architecture tasks. The work underscores the practical potential of AI-assisted architecture research while outlining directions toward deeper reasoning and system-level capabilities, with the dataset and leaderboard publicly available at the provided URL.
Abstract
We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.
