Table of Contents
Fetching ...

Integrating Artificial Intelligence into Operating Systems: A Survey on Techniques, Applications, and Future Directions

Yifan Zhang, Xinkui Zhao, Ziying Li, Guanjie Cheng, Jianwei Yin, Lufei Zhang, Zuoning Chen

TL;DR

The paper surveys the intersection of artificial intelligence and operating systems, framing AI for OS and OS for AI as two complementary directions. It catalogues how traditional ML, LLMs, and agent-based intelligence are integrated across core OS modules and the wider ecosystem, while also detailing how OS design can accelerate AI workloads through kernel-bypass, modular architectures, and datacenter-ready abstractions. A three-stage maturity roadmap—AI-powered, AI-refactored, and AI-driven OSs—maps practical progression from prototypes to production-grade systems, addressing challenges in scalability, reliability, and governance. The work also highlights methodological, engineering, and governance pitfalls and proposes future directions, toolchains, and hybrid frameworks to enable trustworthy, scalable, and Explainable AI-driven system software. Overall, the survey offers a unified view of leveraging AI to build adaptive OSs and of evolving OS architectures to support increasingly demanding AI workloads.

Abstract

Heterogeneous hardware and dynamic workloads worsen long-standing OS bottlenecks in scalability, adaptability, and manageability. At the same time, advances in machine learning (ML), large language models (LLMs), and agent-based methods enable automation and self-optimization, but current efforts lack a unifying view. This survey reviews techniques, architectures, applications, challenges, and future directions at the AI-OS intersection. We chart the shift from heuristic- and rule-based designs to AI-enhanced systems, outlining the strengths of ML, LLMs, and agents across the OS stack. We summarize progress in AI for OS (core components and the wider ecosystem) and in OS for AI (component- and architecture-level support for short- and long-context inference, distributed training, and edge inference). For practice, we consolidate evaluation dimensions, methodological pipelines, and patterns that balance real-time constraints with predictive accuracy. We identify key challenges, such as complexity, overhead, model drift, limited explainability, and privacy and safety risks, and recommend modular, AI-ready kernel interfaces; unified toolchains and benchmarks; hybrid rules-plus-AI decisions with guardrails; and verifiable in-kernel inference. Finally, we propose a three-stage roadmap including AI-powered, AI-refactored, and AI-driven OSs, to bridge prototypes and production and to enable scalable, reliable AI deployment.

Integrating Artificial Intelligence into Operating Systems: A Survey on Techniques, Applications, and Future Directions

TL;DR

The paper surveys the intersection of artificial intelligence and operating systems, framing AI for OS and OS for AI as two complementary directions. It catalogues how traditional ML, LLMs, and agent-based intelligence are integrated across core OS modules and the wider ecosystem, while also detailing how OS design can accelerate AI workloads through kernel-bypass, modular architectures, and datacenter-ready abstractions. A three-stage maturity roadmap—AI-powered, AI-refactored, and AI-driven OSs—maps practical progression from prototypes to production-grade systems, addressing challenges in scalability, reliability, and governance. The work also highlights methodological, engineering, and governance pitfalls and proposes future directions, toolchains, and hybrid frameworks to enable trustworthy, scalable, and Explainable AI-driven system software. Overall, the survey offers a unified view of leveraging AI to build adaptive OSs and of evolving OS architectures to support increasingly demanding AI workloads.

Abstract

Heterogeneous hardware and dynamic workloads worsen long-standing OS bottlenecks in scalability, adaptability, and manageability. At the same time, advances in machine learning (ML), large language models (LLMs), and agent-based methods enable automation and self-optimization, but current efforts lack a unifying view. This survey reviews techniques, architectures, applications, challenges, and future directions at the AI-OS intersection. We chart the shift from heuristic- and rule-based designs to AI-enhanced systems, outlining the strengths of ML, LLMs, and agents across the OS stack. We summarize progress in AI for OS (core components and the wider ecosystem) and in OS for AI (component- and architecture-level support for short- and long-context inference, distributed training, and edge inference). For practice, we consolidate evaluation dimensions, methodological pipelines, and patterns that balance real-time constraints with predictive accuracy. We identify key challenges, such as complexity, overhead, model drift, limited explainability, and privacy and safety risks, and recommend modular, AI-ready kernel interfaces; unified toolchains and benchmarks; hybrid rules-plus-AI decisions with guardrails; and verifiable in-kernel inference. Finally, we propose a three-stage roadmap including AI-powered, AI-refactored, and AI-driven OSs, to bridge prototypes and production and to enable scalable, reliable AI deployment.
Paper Structure (116 sections, 9 figures, 4 tables)

This paper contains 116 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overall structure of the survey. Section \ref{['sec:background']} introduces the background; Section \ref{['sec:related']} reviews related work; Sections \ref{['sec:module']}, \ref{['sec:tool']}, and \ref{['sec:structure']} discuss AI for OS, the taxonomy of AI tools, and OS for AI, respectively. Section \ref{['sec:stages']} presents the developmental roadmap; Section \ref{['sec:pitfalls']} analyzes common pitfalls; and Section \ref{['sec:future']} outlines future research directions.
  • Figure 2: Fundamental structure of an operating system. The diagram spans from hardware and device drivers at the base, through core kernel subsystems (process, memory, I/O, networking, inter-process communication, and security), to system libraries, user interfaces, and applications. It illustrates how OSs abstract heterogeneous hardware into uniform services for diverse applications.
  • Figure 3: A categorization of integration of AI and OS.
  • Figure 4: General ML for Block Layer Enhancement Workflow
  • Figure 5: Workflow for the software-defined far memory system for warehouse-scale computers
  • ...and 4 more figures