Table of Contents
Fetching ...

A Unified Model of Text and Citations for Topic-Specific Citation Networks

ByungKoo Kim, Saki Kuzushima, Yuki Shiraito

TL;DR

The paragraph-citation topic model is developed, which analyzes citation networks and document texts jointly and extends conventional topic models by assigning topics to paragraphs of citing documents, allowing citations to share topics with their embedding paragraphs.

Abstract

Social scientists analyze citation networks to study how documents influence subsequent work across various domains such as judicial politics and international relations. However, conventional approaches that summarize document attributes in citation networks often overlook the diverse semantic contexts in which citations occur. This paper develops the paragraph-citation topic model (PCTM), which analyzes citation networks and document texts jointly. The PCTM extends conventional topic models by assigning topics to paragraphs of citing documents, allowing citations to share topics with their embedding paragraphs. Our empirical analysis of U.S. Supreme Court opinions in the privacy issue domain, which includes cases on reproductive rights, demonstrates that citations within individual documents frequently span multiple substantive areas, and citations to individual documents show considerable topical diversity.

A Unified Model of Text and Citations for Topic-Specific Citation Networks

TL;DR

The paragraph-citation topic model is developed, which analyzes citation networks and document texts jointly and extends conventional topic models by assigning topics to paragraphs of citing documents, allowing citations to share topics with their embedding paragraphs.

Abstract

Social scientists analyze citation networks to study how documents influence subsequent work across various domains such as judicial politics and international relations. However, conventional approaches that summarize document attributes in citation networks often overlook the diverse semantic contexts in which citations occur. This paper develops the paragraph-citation topic model (PCTM), which analyzes citation networks and document texts jointly. The PCTM extends conventional topic models by assigning topics to paragraphs of citing documents, allowing citations to share topics with their embedding paragraphs. Our empirical analysis of U.S. Supreme Court opinions in the privacy issue domain, which includes cases on reproductive rights, demonstrates that citations within individual documents frequently span multiple substantive areas, and citations to individual documents show considerable topical diversity.

Paper Structure

This paper contains 28 sections, 49 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: The result of three topic models, LDA, RTM, and PCTM from (a) to (c), on the US Supreme Court opinions of the privacy issue area. A node represents an opinion, and an edge represents a citation between opinions. The color composition of a node follows the topic proportion of words (LDA, RTM) or paragraphs (PCTM) in the given opinion. The color of an edge is based on the estimated topic of the paragraph where the citation is made. Note that the topic spaces of the three models are not exactly the same. Same colors are assigned to topics that share the top 5 most frequent words between the three models. (a) LDA estimates topic structure of documents without reference to the citation network. (b) RTM takes into account the linkage between documents for the estimation of topics, but assumes that edges are undirected and remains agnostic about the topics of citations. (c) PCTM recognizes the directions of edges and estimates the topic structure of both documents and citations. PCTM offers a semantic context over how documents are connected by identifying the topic of the paragraph in which a citation is made.
  • Figure 2: The citation network of 11 selected opinions on reproductive rights. The opinions are part of the SCOTUS subset on the privacy issue area. The left panel highlights the paragraphs and citations of Constitutional Rights to Abortion topic. The right panel colors the paragraphs and citations of Regulation of Abortion Procedures topic. The y-axis represents chronological order such that opinions placed lower indicate older in time and opinions placed in the upper part of the figure are more recent documents.
  • Figure 3: Subnetworks specific to each topic. The subnetworks are created by extracting opinions that either send or receive citations of the given topic. The topic-specific subnetworks can be useful in revealing whether and the extent to which topological features of the network varies by topic. For each subnetwork, paragraphs of other topics are all colored in gray for better visualization.
  • Figure 4: Predicted Probability of Topics for the Paragraphs of Dobbs v. Jackson. Each vertical bar represents a paragraph. Each paragraph is colored according to the predicted probability of topics. We focus on two topics related to abortion: Constitutional Rights to Abortion and Regulation of Abortion Procedure. The case are Gonzales v. Cargard, Stenberg v. Carhard, and Dobbs v. Jackson Women's Health Organization, from top to bottom. Dobbs v. Jackson Women's Health Organization case have more paragraphs with Constitutional rights to abortion topic rather than Regulation of abortion procedure topic while the two recent precedents in our corpus, Gonzales v. Carhard and Stenberg v. Carhard, are the opposite. This shows that Dobbs v. Jackson Women's Health Organization goes against the recent trend in the abortion cases in our corpus, where the stronger emphasis is placed on how abortion can be regulated by the states instead of whether abortion is a part of the constitutional rights, as shown in Gonzales v. Carhard and Stenberg v. Carhard.
  • Figure D.1: MCMC convergence of $\pmb\tau$ posterior samples in simulation. Horizontal red line indicates the true values of $\pmb\tau$.
  • ...and 12 more figures