Table of Contents
Fetching ...

Custom Representations of Inductive Families

Constantine Theocharis, Edwin Brady

TL;DR

The paper tackles performance inefficiencies of inductive families by introducing custom, correctness-preserving data representations that erase at compile time. It develops ${ extsc{datatt}}$, a dependent-type language with a representation modality, and a translation to ${ extsc{mltt}}$ that replaces data by their inductive algebras, enabling zero-cost reindexing and data reuse. The approach is formalised in Agda and prototyped in Superfluid, and it encompasses broad techniques such as the Nat-hack, vector refinements, general reindexing, and zero-copy deserialisation, all backed by an algebraic sigatures framework and coherence guarantees. This framework decouples logical data structure from runtime representation, enabling compiler-driven optimisations while preserving dependent-type reasoning and enabling computationally irrelevent isomorphisms between original and represented data, with practical implications for memory layout and performance.

Abstract

Inductive families provide a convenient way of programming with dependent types. Yet, when it comes to compilation, their default linked-tree runtime representations, as well as the need to convert between different indexed views of the same data, can lead to unsatisfactory runtime performance. In this paper, we introduce a language with dependent types, and inductive families with customisable representations. Representations are a version of Wadler's views, refined to inductive families like in Epigram, but with compilation guarantees: a represented inductive family will not leave any runtime traces behind, without relying on heuristics such as deforestation. This way, we can build a library of convenient inductive families based on a minimal set of primitives, whose re-indexing and conversion functions are erased during compilation. We show how we can express optimisation techniques such as representing Nat-like types as GMP-style big integers, without special casing in the compiler. With dependent types, reasoning about data representations is also possible through a provided modality. This yields computationally irrelevant isomorphisms between the original and represented data.

Custom Representations of Inductive Families

TL;DR

The paper tackles performance inefficiencies of inductive families by introducing custom, correctness-preserving data representations that erase at compile time. It develops , a dependent-type language with a representation modality, and a translation to that replaces data by their inductive algebras, enabling zero-cost reindexing and data reuse. The approach is formalised in Agda and prototyped in Superfluid, and it encompasses broad techniques such as the Nat-hack, vector refinements, general reindexing, and zero-copy deserialisation, all backed by an algebraic sigatures framework and coherence guarantees. This framework decouples logical data structure from runtime representation, enabling compiler-driven optimisations while preserving dependent-type reasoning and enabling computationally irrelevent isomorphisms between original and represented data, with practical implications for memory layout and performance.

Abstract

Inductive families provide a convenient way of programming with dependent types. Yet, when it comes to compilation, their default linked-tree runtime representations, as well as the need to convert between different indexed views of the same data, can lead to unsatisfactory runtime performance. In this paper, we introduce a language with dependent types, and inductive families with customisable representations. Representations are a version of Wadler's views, refined to inductive families like in Epigram, but with compilation guarantees: a represented inductive family will not leave any runtime traces behind, without relying on heuristics such as deforestation. This way, we can build a library of convenient inductive families based on a minimal set of primitives, whose re-indexing and conversion functions are erased during compilation. We show how we can express optimisation techniques such as representing Nat-like types as GMP-style big integers, without special casing in the compiler. With dependent types, reasoning about data representations is also possible through a provided modality. This yields computationally irrelevant isomorphisms between the original and represented data.

Paper Structure

This paper contains 9 sections, 21 equations, 2 figures.

Figures (2)

  • Figure 1: Rules for forming telescopes and spines.
  • Figure 2: Rules for forming signatures and operations.