Table of Contents
Fetching ...

Exploitation and exploration in text evolution. Quantifying planning and translation flows during writing

Donald Ruggiero Lo Sardo, Pietro Gravino, Christine Cuskley, Vittorio Loreto

Abstract

Writing is a complex process at the center of much of modern human activity. Despite it appears to be a linear process, writing conceals many highly non-linear processes. Previous research has focused on three phases of writing: planning, translation and transcription, and revision. While research has shown these are non-linear, they are often treated linearly when measured. Here, we introduce measures to detect and quantify subcycles of planning (exploration) and translation (exploitation) during the writing process. We apply these to a novel dataset that recorded the creation of a text in all its phases, from early attempts to the finishing touches on a final version. This dataset comes from a series of writing workshops in which, through innovative versioning software, we were able to record all the steps in the construction of a text. More than 60 junior researchers in science wrote a scientific essay intended for a general readership. We recorded each essay as a writing cloud, defined as a complex topological structure capturing the history of the essay itself. Through this unique dataset of writing clouds, we expose a representation of the writing process that quantifies its complexity and the writer's efforts throughout the draft and through time. Interestingly, this representation highlights the phases of "translation flow", where authors improve existing ideas, and exploration, where creative deviations appear as the writer returns to the planning phase. These turning points between translation and exploration become rarer as the writing process progresses and the author approaches the final version. Our results and the new measures introduced have the potential to foster the discussion about the non-linear nature of writing and support the development of tools that can support more creative and impactful writing processes.

Exploitation and exploration in text evolution. Quantifying planning and translation flows during writing

Abstract

Writing is a complex process at the center of much of modern human activity. Despite it appears to be a linear process, writing conceals many highly non-linear processes. Previous research has focused on three phases of writing: planning, translation and transcription, and revision. While research has shown these are non-linear, they are often treated linearly when measured. Here, we introduce measures to detect and quantify subcycles of planning (exploration) and translation (exploitation) during the writing process. We apply these to a novel dataset that recorded the creation of a text in all its phases, from early attempts to the finishing touches on a final version. This dataset comes from a series of writing workshops in which, through innovative versioning software, we were able to record all the steps in the construction of a text. More than 60 junior researchers in science wrote a scientific essay intended for a general readership. We recorded each essay as a writing cloud, defined as a complex topological structure capturing the history of the essay itself. Through this unique dataset of writing clouds, we expose a representation of the writing process that quantifies its complexity and the writer's efforts throughout the draft and through time. Interestingly, this representation highlights the phases of "translation flow", where authors improve existing ideas, and exploration, where creative deviations appear as the writer returns to the planning phase. These turning points between translation and exploration become rarer as the writing process progresses and the author approaches the final version. Our results and the new measures introduced have the potential to foster the discussion about the non-linear nature of writing and support the development of tools that can support more creative and impactful writing processes.
Paper Structure (14 sections, 4 figures, 1 table)

This paper contains 14 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Complexity of the writing process. (a) The cloud of all versions of the text produced by a generic author. Each sentence written, even those deleted in the ongoing process, is represented as a point in this graph. The position on the horizontal axis represents the ordinal location of a sentence in a given draft version, the Sentence Number. The position on the vertical axis represents the version of the draft a sentence first appeared in, the Time of Edit $t$. A point in $\{x,t\}$ is a sentence first appearing as the $x^{\text{th}}$ sentence of the $t^{\text{th}}$ version of the draft. Focusing on a specific Time of Edit, the image shows all the sentences that have been revised at that time. Focusing on a particular Sentence Number, the image shows all the times at which that sentence has been edited. Each version of the draft is displayed by drawing a semi-transparent line through its ordered component sentences. Sequences of sentences preserved among different draft versions are more opaque. An example of two consecutive versions of the draft is highlighted in black [version 70] and red [version 71]. The $21^{\text{st}}$ sentence of version 70, first appearing in version 20, has been edited in version 71. (b) We report the average number of edits of each sentence across all authors. The horizontal axis displays the position of each sentence in the text, while the vertical axis reports the number of edits. The blue area denotes the $99.5\%$ confidence interval over the distribution using the bootstrapping procedure [$N=1000$]. (c) The distribution of complexity values. The pink histogram displays the values of the complexity of the writing processes recorded over all the workshops. Shaded areas are kernel density estimates for the individual workshops. The cyan line displays the probability density function computed over versions generated by randomly permuting the sequence of edits.
  • Figure 2: Exploration patterns in writing.(a) Distance from the shortest path during the editing phase from beginning to end for two authors [in cyan and green] and the average behavior of all the authors [dark blue line]. The shortest path would be achieved by someone who never makes revisions to the text. It is evident how the deviation from the shortest path varies strongly throughout the writing process, and it is stronger further from the beginning and the end. The mean curve is displayed as the dark blue line, while the pale blue area outlines the $99.5\%$ confidence interval estimated using the bootstrap technique. (b) Exploration coefficient vs. the number of versions of each document. The Exploration Coefficient shows weak to no correlation with the number of versions, Pearson R=0.10, p=0.44. The values for the two versions presented on the left are outlined with circles of the same color. (c) Distribution of the Exploration Coefficient, defined as the area under the curves of the left panel, aggregated over all workshops [pink] and separately for each workshop.
  • Figure 3: Trajectory of a draft. An illustration of the tsne embedding of the draft versions for a single author in two dimensions. Distances in this space are the number of characters edited between one draft and another. The set of arrows depicts the trajectory of the work on a single draft. The top right depiction shows an example section of the trajectory to display the vectors used in computing the Twist Ratio and the thresholds between exploration and translation flow.
  • Figure 4: Twist ratio (a) The distribution of the angle between consecutive velocities in the evolution of the draft. The area we have defined as exploratory is highlighted in gray. The dark blue line depicts the average distribution of the angle, while the pale blue area outlines the $99.5\%$ confidence interval estimated using the bootstrap technique at each point. The light green and cyan lines show two example versions. (b) The distribution of the Twist Ratio over all workshops [pink] and the kernel density estimation for each workshop. (c) The distribution of the exploratory steps per draft compared to the number of edits for each workshop. The examples presented in the left panel are outlined in the corresponding color.