Unix Tools and the FITO Category Mistake: Crash Consistency and the Protocol Nature of Persistence
Paul Borrill
TL;DR
FITO analysis demonstrates that the assumption of instantaneous atomic state transitions constitutes a category mistake at every layer of the computing stack -- from ext4 journaling and delayed allocation, through fsync failure semantics, NVMe Flush/FUA device behavior, and Linux restartable sequences.
Abstract
Unix tools such as ls, cp, mv, and rename expose a filesystem abstraction that appears to present a single, authoritative state evolving through atomic transitions. This abstraction is false. We present a systematic Forward-In-Time-Only (FITO) analysis demonstrating that the assumption of instantaneous atomic state transitions constitutes a category mistake at every layer of the computing stack -- from ext4 journaling and delayed allocation, through fsync failure semantics, NVMe Flush/FUA device behavior, and Linux restartable sequences, down to the x86-64 CPU's own inability to guarantee atomic supervisor entry under Non-Maskable Interrupts. We prove a formal impossibility result: no syscall-based persistence primitive can define a commit boundary under failure, because the syscall return value is consistent with multiple materially different persistence states across Linux filesystems. We identify cross-layer temporal assumption leakage as the structural mechanism by which the category mistake propagates, and show that the entire storage stack forms a recursive chain of non-atomic dependencies whose apparent atomicity reflects mathematical impossibility (Herlihy, 1991), not merely engineering deficiency. An appendix documents the real-world consequences: cascading cloud outages at Google, AWS, Meta, and Cloudflare driven by retry amplification; database corruption from fsync failures in PostgreSQL, etcd, and MySQL; silent data corruption at CERN, NetApp, and Meta; AI training waste consuming 12--43% of compute budgets at scale; and financial system failures totaling billions of dollars annually. These consequences trace to a single structural cause: systems designed around the FITO assumption, compensating for its failure with retry-and-recover protocols that amplify the very failures they attempt to mask.
