Table of Contents
Fetching ...

The Dynamics of Innovation in Open Source Software Ecosystems

Gábor Mészáros, Johannes Wachs

TL;DR

This work studies imports of libraries in 12 different programming language ecosystems within millions of Stack Overflow posts over a 15 year period, revealing how ecosystems grow and highlighting implications for sustainability.

Abstract

Software libraries are the elementary building blocks of open source software ecosystems, extending the capabilities of programming languages beyond their standard libraries. Although ecosystem health is often quantified using data on libraries and their interdependencies, we know little about the rate at which new libraries are developed and used. Here we study imports of libraries in 12 different programming language ecosystems within millions of Stack Overflow posts over a 15 year period. New libraries emerge at a remarkably predictable sub-linear rate within ecosystems per post. As a consequence, the distribution of the frequency of use of libraries in all ecosystems is highly concentrated: the most widely used libraries are used many times more often than the average. Although new libraries come out more slowly over time, novel combinations of libraries appear at an approximately linear rate, suggesting that recombination is a key innovation process in software. Newer users are more likely to use new libraries and new combinations, and we find significant variation in the rates of innovation between countries. Our work links the evolution of OSS ecosystems to the literature on the dynamics of innovation, revealing how ecosystems grow and highlighting implications for sustainability.

The Dynamics of Innovation in Open Source Software Ecosystems

TL;DR

This work studies imports of libraries in 12 different programming language ecosystems within millions of Stack Overflow posts over a 15 year period, revealing how ecosystems grow and highlighting implications for sustainability.

Abstract

Software libraries are the elementary building blocks of open source software ecosystems, extending the capabilities of programming languages beyond their standard libraries. Although ecosystem health is often quantified using data on libraries and their interdependencies, we know little about the rate at which new libraries are developed and used. Here we study imports of libraries in 12 different programming language ecosystems within millions of Stack Overflow posts over a 15 year period. New libraries emerge at a remarkably predictable sub-linear rate within ecosystems per post. As a consequence, the distribution of the frequency of use of libraries in all ecosystems is highly concentrated: the most widely used libraries are used many times more often than the average. Although new libraries come out more slowly over time, novel combinations of libraries appear at an approximately linear rate, suggesting that recombination is a key innovation process in software. Newer users are more likely to use new libraries and new combinations, and we find significant variation in the rates of innovation between countries. Our work links the evolution of OSS ecosystems to the literature on the dynamics of innovation, revealing how ecosystems grow and highlighting implications for sustainability.

Paper Structure

This paper contains 9 sections, 13 figures, 1 table.

Figures (13)

  • Figure 1: Stack Overflow post tagged with the Python programming language and containing a code snippet with a library import.
  • Figure 2: Toy example of three sequential posts with import statements in the Python programming language. The first post contains two novel imports, the second contains one, and the third post contains none. The first post contains one novel combination of imports (os, sys), the second contains one novel combination (random, sys), and the third also contains one novel combination (os, random).
  • Figure 3: Rates of simple and combinatorial novelties of library imports in Stack Overflow posts across programming languages.
  • Figure 4: The fraction of total imports in specific programming languages made of the fraction of the most commonly imported libraries. For example, imports of the 7% most frequently imported Python libraries account for 90% of all imports.
  • Figure 5: The likelihood that a post contains a novel library or pair of libraries, respectively, as a function of the user's previous posting experience at the time of the post. Note that we define user experience and novelty of posts at the programming language level, reporting pooled estimates here.
  • ...and 8 more figures