Table of Contents
Fetching ...

Repr Types: One Abstraction to Rule Them All

Viktor Palmkvist, Anders Ågren Thuné, Elias Castegren, David Broman

TL;DR

A new approach to representation-flexible data types without such restrictions and which still finds efficient optimizations is presented, which centers around a single built-in type $\texttt{repr}$ and function overloading with cost annotations for operation implementations.

Abstract

The choice of how to represent an abstract type can have a major impact on the performance of a program, yet mainstream compilers cannot perform optimizations at such a high level. When dealing with optimizations of data type representations, an important feature is having extensible representation-flexible data types; the ability for a programmer to add new abstract types and operations, as well as concrete implementations of these, without modifying the compiler or a previously defined library. Many research projects support high-level optimizations through static analysis, instrumentation, or benchmarking, but they are all restricted in at least one aspect of extensibility. This paper presents a new approach to representation-flexible data types without such restrictions and which still finds efficient optimizations. Our approach centers around a single built-in type $\texttt{repr}$ and function overloading with cost annotations for operation implementations. We evaluate our approach (i) by defining a universal collection type as a library, a single type for all conventional collections, and (ii) by designing and implementing a representation-flexible graph library. Programs using $\texttt{repr}$ types are typically faster than programs with idiomatic representation choices -- sometimes dramatically so -- as long as the compiler finds good implementations for all operations. Our compiler performs the analysis efficiently by finding optimized solutions quickly and by reusing previous results to avoid recomputations.

Repr Types: One Abstraction to Rule Them All

TL;DR

A new approach to representation-flexible data types without such restrictions and which still finds efficient optimizations is presented, which centers around a single built-in type and function overloading with cost annotations for operation implementations.

Abstract

The choice of how to represent an abstract type can have a major impact on the performance of a program, yet mainstream compilers cannot perform optimizations at such a high level. When dealing with optimizations of data type representations, an important feature is having extensible representation-flexible data types; the ability for a programmer to add new abstract types and operations, as well as concrete implementations of these, without modifying the compiler or a previously defined library. Many research projects support high-level optimizations through static analysis, instrumentation, or benchmarking, but they are all restricted in at least one aspect of extensibility. This paper presents a new approach to representation-flexible data types without such restrictions and which still finds efficient optimizations. Our approach centers around a single built-in type and function overloading with cost annotations for operation implementations. We evaluate our approach (i) by defining a universal collection type as a library, a single type for all conventional collections, and (ii) by designing and implementing a representation-flexible graph library. Programs using types are typically faster than programs with idiomatic representation choices -- sometimes dramatically so -- as long as the compiler finds good implementations for all operations. Our compiler performs the analysis efficiently by finding optimized solutions quickly and by reusing previous results to avoid recomputations.
Paper Structure (32 sections, 1 equation, 11 figures, 2 tables)

This paper contains 32 sections, 1 equation, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Overview of our approach split into three perspectives. A user writes a program that references abstract types and operations. An interface designer writes the library that defines these abstract types and operations. Finally, an implementer provides multiple concrete representations and implementations for each corresponding abstraction. Filled arrows denote "uses", while hollow arrows denote "implements".
  • Figure 2: A function show_seq for obtaining a pretty-printed version of a sequence, using the sequence library that we use as a running example for this section. The library introduces one type (seq) and several operations (in this case foldl, split_first, and concat). Note that the compiler will automatically pick representations based on operation usage. If we consider this function in isolation (ignoring places it might be used), the compiler will choose a cons-list for the input seq (which uses split_first and foldl) and a rope for the output seq (which uses concat).
  • Figure 3: The interface of our example library. The repr type is built-in and is used for all abstract types whose representation is to be decided by the compiler. Different abstract types are distinguished by the type parameter passed to repr, which in this case is achieved by the newly defined type seq_t. Operations are introduced by letop and have normal type signatures but no bodies; these are supplied later.
  • Figure 4: A subset of the representations and implementations provided in our example library. A representation (introduced by repr) specifies a possible way to replace a repr type with a concrete type, while an implementation (introduced by letimpl) does the same for operations. Each implementation has a cost, an optional type signature, and a body. The body can contain any expression, including references to other operations.
  • Figure 5: An overview of the relevant parts of our implementation. First, we extend a unification-based type inference algorithm (in our case the already extended version of Algorithm W used in FreezeML emrichFreezeMLCompleteEasy2020) with representation variables to track which repr types must have the same representation. Second, we collect all operation uses across the program, solve a constrained optimization problem to determine concrete implementations for all of them, then update the program to use the chosen implementations. Note that the last step removes our constructs from the program; they are all replaced with normal let-bindings and variables.
  • ...and 6 more figures