Table of Contents
Fetching ...

CSLib: The Lean Computer Science Library

Clark Barrett, Swarat Chaudhuri, Fabrizio Montesi, Jim Grundy, Pushmeet Kohli, Leonardo de Moura, Alexandre Rademaker, Sorrachai Yingchareonthawornchai

TL;DR

CSLib targets a Mathlib-like foundation for computer science within Lean by combining two pillars: formalizing CS concepts (models, algorithms, and logics) and building Boole, an intermediate language that enables verification of everyday code through Lean-based verification conditions. It envisions a scalable ecosystem where verified CSLib components enable reliable software and where AI-assisted formalization and proof are trained on CSLib data, all supported by a collaborative governance model and a concrete 2026–2027 roadmap. The work outlines a path to unify CS knowledge in Lean, provide an end-to-end verification pipeline from mainstream languages to Lean proofs, and catalyze both practical reliability and long-term research in formal methods. The projected impact includes reduced verification costs, safer AI-generated code, and the discovery of new CS insights through AI-assisted formal reasoning.

Abstract

We introduce CSLib, an open-source framework for proving computer-science-related theorems and writing formally verified code in the Lean proof assistant. CSLib aims to be for computer science what Lean's Mathlib is for mathematics. Mathlib has been tremendously impactful: it is a key reason for Lean's popularity within the mathematics research community, and it has also played a critical role in the training of AI systems for mathematical reasoning. However, the base of computer science knowledge in Lean is currently quite limited. CSLib will vastly enhance this knowledge base and provide infrastructure for using this knowledge in real-world verification projects. By doing so, CSLib will (1) enable the broad use of Lean in computer science education and research, and (2) facilitate the manual and AI-aided engineering of large-scale formally verified systems.

CSLib: The Lean Computer Science Library

TL;DR

CSLib targets a Mathlib-like foundation for computer science within Lean by combining two pillars: formalizing CS concepts (models, algorithms, and logics) and building Boole, an intermediate language that enables verification of everyday code through Lean-based verification conditions. It envisions a scalable ecosystem where verified CSLib components enable reliable software and where AI-assisted formalization and proof are trained on CSLib data, all supported by a collaborative governance model and a concrete 2026–2027 roadmap. The work outlines a path to unify CS knowledge in Lean, provide an end-to-end verification pipeline from mainstream languages to Lean proofs, and catalyze both practical reliability and long-term research in formal methods. The projected impact includes reduced verification costs, safer AI-generated code, and the discovery of new CS insights through AI-assisted formal reasoning.

Abstract

We introduce CSLib, an open-source framework for proving computer-science-related theorems and writing formally verified code in the Lean proof assistant. CSLib aims to be for computer science what Lean's Mathlib is for mathematics. Mathlib has been tremendously impactful: it is a key reason for Lean's popularity within the mathematics research community, and it has also played a critical role in the training of AI systems for mathematical reasoning. However, the base of computer science knowledge in Lean is currently quite limited. CSLib will vastly enhance this knowledge base and provide infrastructure for using this knowledge in real-world verification projects. By doing so, CSLib will (1) enable the broad use of Lean in computer science education and research, and (2) facilitate the manual and AI-aided engineering of large-scale formally verified systems.
Paper Structure (13 sections, 7 figures)

This paper contains 13 sections, 7 figures.

Figures (7)

  • Figure 1: CSLib's Pillar 1: A possible organization of covered topics.
  • Figure 2: (a) Definition of a Labeled Transition System ("LTS"), which models the observable behavior of the possible states of a discrete computational system. (b) Definition of a bisimulation between two states. (c) Theorem establishing that the inverse of a bisimulation is a bisimulation.
  • Figure 3: A monadic API for complexity analysis of elementary algorithms. An object of type $\mathtt{TimeM}~\alpha$ comprises $\mathtt{ret}$, which is a value of type $\alpha$, and the $\mathtt{time}$ cost incurred by the computation of this value. The $\mathtt{bind}$ operation chains together two computations and defines the time cost of the composition as the sum of the time costs of the individual operations. The helper function $\mathtt{tick}$ assigns the time cost. The notation $\checkmark$ is syntactic sugar for the invocation of $\mathtt{tick}$. The final section asserts some basic lemmas that help simplify expressions involving $\mathtt{TimeM}$.
  • Figure 4: An implementation of Mergesort using the API in \ref{['fig:complexity-api']}. Note how each invocation of the sorting routine returns a sorted list along with a computational cost. With these definitions in place, one can formally state and prove theorems establishing the correctness of Mergesort and its worst-case number of comparisons.
  • Figure 5: CSLib Code Reasoning Vision.
  • ...and 2 more figures