PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

Vijay Jaisankar; Sambaran Bandyopadhyay; Kalp Vyas; Varre Chaitanya; Shwetha Somasundaram

PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

Vijay Jaisankar, Sambaran Bandyopadhyay, Kalp Vyas, Varre Chaitanya, Shwetha Somasundaram

TL;DR

PostDoc addresses automatic poster generation from long multimodal documents by jointly optimizing content selection and design. It formulates a novel deep submodular function to capture coverage, diversity, and cross-modal alignment among text and images, and trains its weights via a hinge loss with alternating optimization. The pipeline paraphrases the selected content using GPT-3.5-turbo and generates a poster template with font, color, and layout decisions (including a heuristic layout) tuned to the content. Automated and human evaluations on MSMO and NJU-Fudan datasets show PostDoc outperforms baselines in textual coverage and poster aesthetics while offering faster inference and cost efficiency. Limitations include handling non-natural images and structured elements, with future work proposing fine-tuned vision-language models on such content.

Abstract

A poster from a long input document can be considered as a one-page easy-to-read multimodal (text and images) summary presented on a nice template with good design elements. Automatic transformation of a long document into a poster is a very less studied but challenging task. It involves content summarization of the input document followed by template generation and harmonization. In this work, we propose a novel deep submodular function which can be trained on ground truth summaries to extract multimodal content from the document and explicitly ensures good coverage, diversity and alignment of text and images. Then, we use an LLM based paraphraser and propose to generate a template with various design aspects conditioned on the input content. We show the merits of our approach through extensive automated and human evaluations.

PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

TL;DR

Abstract

Paper Structure (34 sections, 1 theorem, 33 equations, 3 figures, 8 tables)

This paper contains 34 sections, 1 theorem, 33 equations, 3 figures, 8 tables.

Related Work and Background
Multimodal Summarization
Layout Generation
Document Transformation
Background on Submodular Functions
Problem Statement and Solution Approach
Multimodal Extractive Summarization
Training and Optimization
Maximization w.r.t. $A$
Minimization w.r.t. $w$
Content Paraphrasing
Template Generation
Font Selection
Color Selection
Layout Generation
...and 19 more sections

Key Result

Theorem 2.1

The set function $f$ in Equation eq:dsf is a monotone submodular function.

Figures (3)

Figure 1: Block Diagram of PostDoc
Figure 2: A sample poster generated by PostDoc for a research paper
Figure 3: A sample layout generated by this method ($N_I$ = 4, $N_T$ = 5)

Theorems & Definitions (2)

Theorem 2.1
proof

PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

TL;DR

Abstract

PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)