AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

Yichen Shi; Ze Zhang; Hongyang Wang; Zhuofu Tao; Zhongyi Li; Bingyu Chen; Yaxin Wang; Zhen huang; Xuhua Liu; Quan Chen; Zhiping Yu; Ting-Jung Lin; Lei He

AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

Yichen Shi, Ze Zhang, Hongyang Wang, Zhuofu Tao, Zhongyi Li, Bingyu Chen, Yaxin Wang, Zhen huang, Xuhua Liu, Quan Chen, Zhiping Yu, Ting-Jung Lin, Lei He

TL;DR

AMSbench addresses the lack of a holistic, multimodal benchmark for evaluating MLLMs in analog/mixed-signal circuits. It integrates perception, analysis, and design tasks across ~8000 questions, spanning eight models and three data streams (AMS-Perception, AMS-Analysis, AMS-Design). The study reveals that state-of-the-art MLLMs struggle with schematic interpretation and complex end-to-end AMS design, though some models excel in specific perception or reasoning sub-tasks. The benchmark sets a clear, practical foundation for advancing automated AMS circuit workflows, with proposed directions including data expansion, retrieval-augmented reasoning, and multi-agent, hybrid optimization strategies.

Abstract

Analog/Mixed-Signal (AMS) circuits play a critical role in the integrated circuit (IC) industry. However, automating Analog/Mixed-Signal (AMS) circuit design has remained a longstanding challenge due to its difficulty and complexity. Although recent advances in Multi-modal Large Language Models (MLLMs) offer promising potential for supporting AMS circuit analysis and design, current research typically evaluates MLLMs on isolated tasks within the domain, lacking a comprehensive benchmark that systematically assesses model capabilities across diverse AMS-related challenges. To address this gap, we introduce AMSbench, a benchmark suite designed to evaluate MLLM performance across critical tasks including circuit schematic perception, circuit analysis, and circuit design. AMSbench comprises approximately 8000 test questions spanning multiple difficulty levels and assesses eight prominent models, encompassing both open-source and proprietary solutions such as Qwen 2.5-VL and Gemini 2.5 Pro. Our evaluation highlights significant limitations in current MLLMs, particularly in complex multi-modal reasoning and sophisticated circuit design tasks. These results underscore the necessity of advancing MLLMs' understanding and effective application of circuit-specific knowledge, thereby narrowing the existing performance gap relative to human expertise and moving toward fully automated AMS circuit design workflows. Our data is released at this URL.

AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

TL;DR

Abstract

AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (37)