Table of Contents
Fetching ...

A Data-driven Investigation of Euphemistic Language: Comparing the usage of "slave" and "servant" in 19th century US newspapers

Jaihyun Park, Ryan Cordell

TL;DR

This study analyzes how euphemistic language around enslaved Black Americans was constructed in 19th-century US newspapers by contrasting the usage of 'slave' and 'servant' in Chronicling America. It combines OCR-aware data collection, FastText-based OCR error screening, deduplication of reprinted texts, Word2vec semantic neighborhoods, and prevalence analyses to reveal region-specific discourses. The results show that 'slave' discourse centers on macro socio-economic, legal, and administrative themes, while 'servant' discourse reflects domestic and religious framing that differs by Northern vs Southern papers; 'slave' discourse is more prevalent in Northern papers, and 'servant' discourse aligns with regional stereotypes. The work demonstrates a data-driven approach to historical discourse analysis and sheds light on how newspapers contributed to white supremacist framing of enslaved Black Americans.

Abstract

This study investigates the usage of "slave" and "servant" in the 19th century US newspapers using computational methods. While both terms were used to refer to enslaved African Americans, they were used in distinct ways. In the Chronicling America corpus, we included possible OCR errors by using FastText embedding and excluded text reprints to consider text reprint culture in the 19th century. Word2vec embedding was used to find semantically close words to "slave" and "servant" and log-odds ratio was calculated to identify over-represented discourse words in the Southern and Northern newspapers. We found that "slave" is associated with socio-economic, legal, and administrative words, however, "servant" is linked to religious words in the Northern newspapers while Southern newspapers associated "servant" with domestic and familial words. We further found that slave discourse words in Southern newspapers are more prevalent in Northern newspapers while servant discourse words from each side are prevalent in their own region. This study contributes to the understanding of how newspapers created different discourses around enslaved African Americans in the 19th century US.

A Data-driven Investigation of Euphemistic Language: Comparing the usage of "slave" and "servant" in 19th century US newspapers

TL;DR

This study analyzes how euphemistic language around enslaved Black Americans was constructed in 19th-century US newspapers by contrasting the usage of 'slave' and 'servant' in Chronicling America. It combines OCR-aware data collection, FastText-based OCR error screening, deduplication of reprinted texts, Word2vec semantic neighborhoods, and prevalence analyses to reveal region-specific discourses. The results show that 'slave' discourse centers on macro socio-economic, legal, and administrative themes, while 'servant' discourse reflects domestic and religious framing that differs by Northern vs Southern papers; 'slave' discourse is more prevalent in Northern papers, and 'servant' discourse aligns with regional stereotypes. The work demonstrates a data-driven approach to historical discourse analysis and sheds light on how newspapers contributed to white supremacist framing of enslaved Black Americans.

Abstract

This study investigates the usage of "slave" and "servant" in the 19th century US newspapers using computational methods. While both terms were used to refer to enslaved African Americans, they were used in distinct ways. In the Chronicling America corpus, we included possible OCR errors by using FastText embedding and excluded text reprints to consider text reprint culture in the 19th century. Word2vec embedding was used to find semantically close words to "slave" and "servant" and log-odds ratio was calculated to identify over-represented discourse words in the Southern and Northern newspapers. We found that "slave" is associated with socio-economic, legal, and administrative words, however, "servant" is linked to religious words in the Northern newspapers while Southern newspapers associated "servant" with domestic and familial words. We further found that slave discourse words in Southern newspapers are more prevalent in Northern newspapers while servant discourse words from each side are prevalent in their own region. This study contributes to the understanding of how newspapers created different discourses around enslaved African Americans in the 19th century US.

Paper Structure

This paper contains 17 sections, 1 equation, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: The datapoints represent the slave discourse words. The slave discourse words from the South is represented with red cross, the slave discourse words from the North is represented with blue circle, and the slave discourse words that appeared in both Southern and Northern newspapers are in square diamond with green color. X-axis shows the frequency of the words in the entire corpus and Y-axis shows the Z-score of the words in the corpus.
  • Figure 2: The datapoints represent the servant discourse words. The servant discourse words from the South is represented with red cross, the slervant discourse words from the North is represented with blue circle, and the servant discourse words that appeared in both Southern and Northern newspapers are in square diamond with green color. X-axis shows the frequency of the words in the entire corpus and Y-axis shows the Z-score of the words in the corpus.