Table of Contents
Fetching ...

Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality

Taiqiang Wu, Yuxin Cheng, Chenchen Ding, Runming Yang, Xincheng Feng, Wenyong Zhou, Zhengwu Liu, Ngai Wong

Abstract

Memristor-based analog compute-in-memory (CIM) architectures provide a promising substrate for the efficient deployment of Large Language Models (LLMs), owing to superior energy efficiency and computational density. However, these architectures suffer from precision issues caused by intrinsic non-idealities of memristors. In this paper, we first conduct a comprehensive investigation into the impact of such typical non-idealities on LLM reasoning. Empirical results indicate that reasoning capability decreases significantly but varies for distinct benchmarks. Subsequently, we systematically appraise three training-free strategies, including thinking mode, in-context learning, and module redundancy. We thus summarize valuable guidelines, i.e., shallow layer redundancy is particularly effective for improving robustness, thinking mode performs better under low noise levels but degrades at higher noise, and in-context learning reduces output length with a slight performance trade-off. Our findings offer new insights into LLM reasoning under non-ideality and practical strategies to improve robustness.

Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality

Abstract

Memristor-based analog compute-in-memory (CIM) architectures provide a promising substrate for the efficient deployment of Large Language Models (LLMs), owing to superior energy efficiency and computational density. However, these architectures suffer from precision issues caused by intrinsic non-idealities of memristors. In this paper, we first conduct a comprehensive investigation into the impact of such typical non-idealities on LLM reasoning. Empirical results indicate that reasoning capability decreases significantly but varies for distinct benchmarks. Subsequently, we systematically appraise three training-free strategies, including thinking mode, in-context learning, and module redundancy. We thus summarize valuable guidelines, i.e., shallow layer redundancy is particularly effective for improving robustness, thinking mode performs better under low noise levels but degrades at higher noise, and in-context learning reduces output length with a slight performance trade-off. Our findings offer new insights into LLM reasoning under non-ideality and practical strategies to improve robustness.
Paper Structure (17 sections, 5 equations, 7 figures, 3 tables)

This paper contains 17 sections, 5 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of the non-ideality when deploying LLM on memristor-based analog CIM architectures. In this paper, we study the impact on LLM reasoning and discuss available training-free solutions.
  • Figure 2: Performance of Qwen3 0.6B under Non-ideality at various noise levels $\sigma$. The SAF ratio $p$ is 0.01. A darker color denotes larger noise. We report the results on IFEval, GPQA-D, and MATH-500. The vanilla performance is indicated by the dotted line.
  • Figure 3: Error analysis of Qwen3 0.6B on Math-500. Larger noise leads to more formatting issues.
  • Figure 4: Comparison between Non-thinking and Thinking mode that introduces an explicit thinking process.
  • Figure 5: Comparison of Thinking and Non-thinking modes on MATH-500. The SAF ratio $p$ is 0.01. We also report the token ratio of Thinking parts.
  • ...and 2 more figures