Table of Contents
Fetching ...

AI for Scientific Discovery is a Social Problem

Georgia Channing, Avijit Ghosh

TL;DR

The paper argues that AI for scientific discovery is constrained more by social and institutional factors than by technical limits, identifying four interlinked barriers: community dysfunction, misaligned research priorities, data fragmentation, and infrastructure inequities. It critiques the AI-scientist myth, emphasizing mechanistic understanding and experimental grounding over predictive performance, and proposes a multi-pronged agenda—cross-disciplinary education, upstream benchmarking, standardized data practices, and community-owned infrastructure—to align incentives and broaden participation. Through case studies (e.g., CASP, The Materials Project, Schmidt Fellowship) it illustrates how sustained, community-governed efforts can magnify downstream impact beyond isolated advances. The work argues for reframing AI for science as a collective social project where durable collaboration and equitable participation are prerequisites for technical progress and real scientific discovery.

Abstract

Artificial intelligence (AI) is increasingly applied to scientific research, but its benefits remain unevenly distributed across communities and disciplines. While technical challenges such as limited data, fragmented standards, and unequal access to computational resources exist, social and institutional factors are often the primary constraints. Narratives emphasizing autonomous "AI scientists," under-recognition of data and infrastructure work, misaligned incentives, and gaps between domain experts and machine learning researchers all limit the impact of AI on scientific discovery. This paper highlights four interconnected challenges: community coordination, misalignment of research priorities with upstream needs, data fragmentation, and infrastructure inequities. We argue that addressing these challenges requires not only technical innovation but also intentional efforts in community-building, cross-disciplinary education, shared benchmarks, and accessible infrastructure. We call for reframing AI for science as a collective social project, where sustainable collaboration and equitable participation are treated as prerequisites for technical progress

AI for Scientific Discovery is a Social Problem

TL;DR

The paper argues that AI for scientific discovery is constrained more by social and institutional factors than by technical limits, identifying four interlinked barriers: community dysfunction, misaligned research priorities, data fragmentation, and infrastructure inequities. It critiques the AI-scientist myth, emphasizing mechanistic understanding and experimental grounding over predictive performance, and proposes a multi-pronged agenda—cross-disciplinary education, upstream benchmarking, standardized data practices, and community-owned infrastructure—to align incentives and broaden participation. Through case studies (e.g., CASP, The Materials Project, Schmidt Fellowship) it illustrates how sustained, community-governed efforts can magnify downstream impact beyond isolated advances. The work argues for reframing AI for science as a collective social project where durable collaboration and equitable participation are prerequisites for technical progress and real scientific discovery.

Abstract

Artificial intelligence (AI) is increasingly applied to scientific research, but its benefits remain unevenly distributed across communities and disciplines. While technical challenges such as limited data, fragmented standards, and unequal access to computational resources exist, social and institutional factors are often the primary constraints. Narratives emphasizing autonomous "AI scientists," under-recognition of data and infrastructure work, misaligned incentives, and gaps between domain experts and machine learning researchers all limit the impact of AI on scientific discovery. This paper highlights four interconnected challenges: community coordination, misalignment of research priorities with upstream needs, data fragmentation, and infrastructure inequities. We argue that addressing these challenges requires not only technical innovation but also intentional efforts in community-building, cross-disciplinary education, shared benchmarks, and accessible infrastructure. We call for reframing AI for science as a collective social project, where sustainable collaboration and equitable participation are treated as prerequisites for technical progress

Paper Structure

This paper contains 25 sections, 2 figures.

Figures (2)

  • Figure 1: Figure from vafa2025foundationmodelfoundusing contrasts the true Newtonian forces (left) and the predicted forces (right) learned by a transformer-based foundation model with high accuracy in predicting planetary trajectories. Though it performs well on the task it was fine-tuned for, it has not learned an inductive bias toward true Newtonian mechanics. General purpose models do not necessarily aid in specific scientific understanding.
  • Figure 2: Scientific data tokenization challenges. Current AI architectures excel at predicting text tokens (top) but struggle with complex scientific datasets like multi-omics data (bottom), which lack clear tokenization strategies and exhibit low predictive capacity despite their richness and scale.