Due to the recent boom in artificial intelligence (AI) research, including computer vision (CV), it has become impossible for researchers in these fields to keep up with the exponentially increasing number of manuscripts. In response to this situation, this paper proposes the paper summary generation (PSG) task using a simple but effective method to automatically generate an academic paper summary from raw PDF data. We realized PSG by combination of vision-based supervised components detector and language-based unsupervised important sentence extractor, which is applicable for a trained format of manuscripts. We show the quantitative evaluation of ability of simple vision-based components extraction, and the qualitative evaluation that our system can extract both visual item and sentence that are helpful for understanding. After processing via our PSG, the 979 manuscripts accepted by the Conference on Computer Vision and Pattern Recognition (CVPR) 2018 are available. It is believed that the proposed method will provide a better way for researchers to stay caught with important academic papers.



We divided the task into vision-based paper component detection and language-based important sentence extraction. In the vision task, we detect academic paper components {title, authors, abstract, figures, and tables} with YOLOv2, and simultaneously select the most important figure (MIF) in the paper. In the language task, we summarize the sentences of the academic paper with Luhn's method. Finally, we generate a single-page summary that combines the components collected by the vision- and language-based methods.

Example Results