The building blocks of software work explain coding careers and language popularity
Xiangnan Feng, Johannes Wachs, Simone Daniotti, Frank Neffke
TL;DR
This paper builds a fine-grained taxonomy of software development tasks from Stack Overflow, linking micro-level problem solving to macro labor-market outcomes. Using a bipartite stochastic block model, PMI-based task relatedness, and UMAP/HDBSCAN for visualization, the authors identify 237 canonical software tasks and map them to real-world job ads and salaries. They show that task value predicts advertised salaries and that individuals learn and diversify across tasks, with Python enabling entry into higher-value tasks and broader career flexibility. The study demonstrates the utility of large-scale task taxonomies for understanding labor-market dynamics, technology diffusion, and language-driven career trajectories, while acknowledging limitations and offering avenues for application in education and workforce development.
Abstract
Recent waves of technological transformation have fueled debates about the changing nature of work. Yet to understand the future of work, we need to know more about what people actually do in their jobs, going beyond educational credentials or job descriptions. Here we analyze work in the global software industry using tens of millions of Question and Answer posts on Stack Overflow to create a fine-grained taxonomy of software tasks, the elementary building blocks of software development work. These tasks predict salaries and job requirements in real-world job ads. We also observe how individuals learn within tasks and diversify into new tasks. Tasks that people acquire tend to be related to their old ones, but of lower value, suggesting that they are easier. An exception is users of Python, an increasingly popular programming language known for its versatility. Python users enter tasks that tend to be higher-value, providing an explanation for the language's growing popularity based on the tasks Python enables its users to perform. In general, these insights demonstrate the value of task taxonomies extracted at scale from large datasets: they offer high resolution and near real-time descriptions of changing labor markets. In the case of software tasks, they map such changes for jobs at the forefront of a digitizing global economy.
