Table of Contents
Fetching ...

Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper

Samuel Goree, Gabriel Appleby, David Crandall, Norman Su

TL;DR

This paper investigates how computer vision research papers have evolved into designed media artifacts within an expanding attention economy. It adopts media-archaeology, interviews with veteran researchers, and computational analysis of CVPR papers from 2013–2021 to reveal three core patterns: teaser images and acronyms advertising the contribution, dense results tables serving as measurable benchmarks, and a shift from print to color-rich, screen-based PDFs facilitated by digital proceedings and arXiv. The findings show that papers increasingly commodify attention, with the design of figures, tables, and promotional materials shaping which research gets noticed and cited, while peer-review labor and reading practices adapt to faster dissemination. The study discusses broader implications for publishing design, proposing slow-design-inspired approaches and tools to balance speed with careful scholarship, and calls for systemic solutions to the attention-driven pressures that permeate scholarly communication. Overall, it argues that treating attention as labor helps explain the visual evolution of CV papers and suggests pathways to more equitable and sustainable publishing practices.

Abstract

Research papers, in addition to textual documents, are a designed interface through which researchers communicate. Recently, rapid growth has transformed that interface in many fields of computing. In this work, we examine the effects of this growth from a media archaeology perspective, through the changes to figures and tables in research papers. Specifically, we study these changes in computer vision over the past decade, as the deep learning revolution has driven unprecedented growth in the discipline. We ground our investigation through interviews with veteran researchers spanning computer vision, graphics, and visualization. Our analysis focuses on the research attention economy: how research paper elements contribute towards advertising, measuring, and disseminating an increasingly commodified "contribution." Through this work, we seek to motivate future discussion surrounding the design of both the research paper itself as well as the larger sociotechnical research publishing system, including tools for finding, reading, and writing research papers.

Attention is All They Need: Exploring the Media Archaeology of the Computer Vision Research Paper

TL;DR

This paper investigates how computer vision research papers have evolved into designed media artifacts within an expanding attention economy. It adopts media-archaeology, interviews with veteran researchers, and computational analysis of CVPR papers from 2013–2021 to reveal three core patterns: teaser images and acronyms advertising the contribution, dense results tables serving as measurable benchmarks, and a shift from print to color-rich, screen-based PDFs facilitated by digital proceedings and arXiv. The findings show that papers increasingly commodify attention, with the design of figures, tables, and promotional materials shaping which research gets noticed and cited, while peer-review labor and reading practices adapt to faster dissemination. The study discusses broader implications for publishing design, proposing slow-design-inspired approaches and tools to balance speed with careful scholarship, and calls for systemic solutions to the attention-driven pressures that permeate scholarly communication. Overall, it argues that treating attention as labor helps explain the visual evolution of CV papers and suggests pathways to more equitable and sustainable publishing practices.

Abstract

Research papers, in addition to textual documents, are a designed interface through which researchers communicate. Recently, rapid growth has transformed that interface in many fields of computing. In this work, we examine the effects of this growth from a media archaeology perspective, through the changes to figures and tables in research papers. Specifically, we study these changes in computer vision over the past decade, as the deep learning revolution has driven unprecedented growth in the discipline. We ground our investigation through interviews with veteran researchers spanning computer vision, graphics, and visualization. Our analysis focuses on the research attention economy: how research paper elements contribute towards advertising, measuring, and disseminating an increasingly commodified "contribution." Through this work, we seek to motivate future discussion surrounding the design of both the research paper itself as well as the larger sociotechnical research publishing system, including tools for finding, reading, and writing research papers.
Paper Structure (20 sections, 5 figures, 1 table)

This paper contains 20 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Five teaser images from papers in different venues, and a still image from a television advertisement for paper towels. Figures look best zoomed in.
  • Figure 2: Left: fraction of CVPR papers with a teaser image. Right: fraction of CVPR paper titles with colons, unique acronyms, acronyms followed by a colon, and unique acronyms followed by colons. Differing timescales are due to differing availability of full PDFs vs. title data.
  • Figure 3: Left: The fraction of CVPR papers with figures and tables over time. Right: The average number of figures and tables per CVPR paper over time.
  • Figure 4: Six results tables with numbers in bold. (a) is the earliest example of this style we found, (b) is the earliest example from computer vision. (c) is from the highly influential 2012 AlexNet ImageNet classification paper NIPS2012_c399862d, (d) is a 2021 state of the art result on ImageNet dai2021coatnet, (e) is a more trendy table from 2021, making use of grayscale background, colored numbers and subscript arrows showing improvement li2021exploring and (f) is a table from a 2022 CVPR paper hruby2022learning from the geometric side of computer vision, which is historically more mathematical and usually has fewer such tables.
  • Figure 5: Top: three screenshots at different sizes of a figure from gao2021visualvoice which is too small to reproduce using a standard office printer. Bottom: three examples pages from CVPR papers with highly information-dense figures in small page regions, shown at 20% of original size.