Table of Contents
Fetching ...

The Values Encoded in Machine Learning Research

Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, Michelle Bao

TL;DR

The paper challenges the view that ML research is value-neutral by introducing a fine-grained annotation scheme and applying it to 100 influential ICML/NeurIPS papers. It uncovers 59 uplifted values, with Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty dominating, while societal needs and negative impacts receive little attention. Quantitative findings reveal a sharp rise in corporate and big-tech affiliations, signaling power centralization in the field. The study provides a publicly available annotated corpus and argues for reflexive, value-aware ML research to broaden governance and ethical considerations.

Abstract

Machine learning currently exerts an outsized influence on the world, increasingly affecting institutional practices and impacted communities. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we first introduce a method and annotation scheme for studying the values encoded in documents such as research papers. Applying the scheme, we analyze 100 highly cited machine learning papers published at premier machine learning conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: their justification for their choice of project, which attributes of their project they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that few of the papers justify how their project connects to a societal need (15\%) and far fewer discuss negative potential (1\%). Through line-by-line content analysis, we identify 59 values that are uplifted in ML research, and, of these, we find that the papers most frequently justify and assess themselves based on Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty. We present extensive textual evidence and identify key themes in the definitions and operationalization of these values. Notably, we find systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power.Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.

The Values Encoded in Machine Learning Research

TL;DR

The paper challenges the view that ML research is value-neutral by introducing a fine-grained annotation scheme and applying it to 100 influential ICML/NeurIPS papers. It uncovers 59 uplifted values, with Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty dominating, while societal needs and negative impacts receive little attention. Quantitative findings reveal a sharp rise in corporate and big-tech affiliations, signaling power centralization in the field. The study provides a publicly available annotated corpus and argues for reflexive, value-aware ML research to broaden governance and ethical considerations.

Abstract

Machine learning currently exerts an outsized influence on the world, increasingly affecting institutional practices and impacted communities. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we first introduce a method and annotation scheme for studying the values encoded in documents such as research papers. Applying the scheme, we analyze 100 highly cited machine learning papers published at premier machine learning conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: their justification for their choice of project, which attributes of their project they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that few of the papers justify how their project connects to a societal need (15\%) and far fewer discuss negative potential (1\%). Through line-by-line content analysis, we identify 59 values that are uplifted in ML research, and, of these, we find that the papers most frequently justify and assess themselves based on Performance, Generalization, Quantitative evidence, Efficiency, Building on past work, and Novelty. We present extensive textual evidence and identify key themes in the definitions and operationalization of these values. Notably, we find systematic textual evidence that these top values are being defined and applied with assumptions and implications generally supporting the centralization of power.Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.

Paper Structure

This paper contains 26 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Proportion of annotated papers that uplift each value.
  • Figure 2: Corporate and Big Tech author affiliations. The percent of papers with Big Tech author affiliations increased from 13% in 2008/09 to 47% in 2018/19.
  • Figure 3: Affiliations and funding ties. From 2008/09 to 2018/19, the percent of papers tied to nonprofits, research institutes, and tech companies increased substantially. Most significantly, ties to Big Tech increased threefold and overall ties to tech companies increased to 79%. Non-N.A. Universities are those outside the U.S. and Canada.
  • Figure C.1: Proportion of annotated papers that uplifted each value, prior to combining.
  • Figure E.2: Proportion of papers in from 2008--2020 (combining NeurIPS and ICML) predicted to have at least one sentence expressing each value (left), and estimated performance (F1) of the corresponding classifiers (right). Note that the overall performance of most classifiers is generally poor, indicating that the estimates on the left should be treated as unreliable in most cases. Grey bars represent the clustered values. Classifiers were not trained for values with less than 20 representative sentences.
  • ...and 1 more figures