Table of Contents
Fetching ...

Story2Proposal: A Scaffold for Structured Scientific Paper Writing

Zhuoyang Qian, Wei Shi, Xu Lin, Li Ling, Meng Luo, Ziming Wang, Zhiwei Zhang, Tengyue Xu, Gaoge Liu, Zhentao Zhang, Shuo Zhang, Ziqi Wang, Zheng Feng, Yan Luo, Shu Xu, Yongjin Chen, Zhibo Feng, Zhuo Chen, Bruce Yuan, Biao Wu, Harry Wang, Kris Chen

Abstract

Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing structural drift, missing figures or tables, and cross-section inconsistencies. We introduce Story2Proposal, a contract-governed multi-agent framework that converts a research story into a structured manuscript through coordinated agents operating under a persistent shared visual contract. The system organizes architect, writer, refiner, and renderer agents around a contract state that tracks section structure and registered visual elements, while evaluation agents supply feedback in a generate evaluate adapt loop that updates the contract during generation. Experiments on tasks derived from the Jericho research corpus show that Story2Proposal achieved an expert evaluation score of 6.145 versus 3.963 for DirectChat (+2.182) across GPT, Claude, Gemini, and Qwen backbones. Compared with the structured generation baseline Fars, Story2Proposal obtained an average score of 5.705 versus 5.197, indicating improved structural consistency and visual alignment.

Story2Proposal: A Scaffold for Structured Scientific Paper Writing

Abstract

Generating scientific manuscripts requires maintaining alignment between narrative reasoning, experimental evidence, and visual artifacts across the document lifecycle. Existing language-model generation pipelines rely on unconstrained text synthesis with validation applied only after generation, often producing structural drift, missing figures or tables, and cross-section inconsistencies. We introduce Story2Proposal, a contract-governed multi-agent framework that converts a research story into a structured manuscript through coordinated agents operating under a persistent shared visual contract. The system organizes architect, writer, refiner, and renderer agents around a contract state that tracks section structure and registered visual elements, while evaluation agents supply feedback in a generate evaluate adapt loop that updates the contract during generation. Experiments on tasks derived from the Jericho research corpus show that Story2Proposal achieved an expert evaluation score of 6.145 versus 3.963 for DirectChat (+2.182) across GPT, Claude, Gemini, and Qwen backbones. Compared with the structured generation baseline Fars, Story2Proposal obtained an average score of 5.705 versus 5.197, indicating improved structural consistency and visual alignment.

Paper Structure

This paper contains 20 sections, 9 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: System overview of Story2Proposal. Given a research story as input, the framework coordinates architect, writer, refiner, and renderer agents through a persistent shared visual contract that records section structure, registered visual artifacts, and validation rules. Evaluation agents inspect intermediate outputs and feed corrective signals back to the contract state, enabling provenance-aware planning, structurally constrained drafting, global refinement, and deterministic rendering of the final manuscript.
  • Figure 2: Schema of the shared visual contract used by Story2Proposal. The contract combines a global registry of figures, tables, and citation slots with section-level obligations and validation rules, so that all agents operate on the same structural state and the renderer can deterministically verify label uniqueness, reference resolution, and narrative--visual alignment before compilation.
  • Figure 3: Expert evaluation protocol used in our experiments. Each condition combines one of two generation methods, one of four LLM backbones (GPT, Claude, Gemini, and Qwen), and ten independent expert reviewers. Reviewers assess complete manuscripts on structural integrity, writing clarity, methodological rigor, experimental substance, citation hygiene, reproducibility, formatting stability, and visual communication, and the resulting scores are aggregated into the overall expert evaluation metric.
  • Figure 4: Average performance deltas between Story2Proposal and baseline generation methods across experiments. Positive values indicate that the contract-governed multi-agent framework outperforms the comparison system under the same backbone or benchmark setting. The figure summarizes both the large gains over DirectChat in the cross-model study and the smaller but consistent improvements over the structured Fars baseline, highlighting the benefit of stronger structural control and narrative--visual alignment.