Semi-Automatic Systematic Literature Reviews and Information Extraction of COVID-19 Scientific Evidence: Description and Preliminary Results of the COKE Project

Golinelli, Davide; Nuzzolese, Andrea Giovanni; Sanmarchi, Francesco; Bulla, Luana; Mongiovì, Misael; Gangemi, Aldo; Rucci, Paola

doi:10.3390/info13030117

The COVID-19 pandemic highlighted the importance of validated and updated scientificinformation to help policy makers, healthcare professionals, and the public. The speed in disseminating reliable information and the subsequent guidelines and policy implementation are also essentialto save as many lives as possible. Trustworthy guidelines should be based on a systematic evidencereview which uses reproducible analytical methods to collect secondary data and analyse them.However, the guidelines’ drafting process is time consuming and requires a great deal of resources.This paper aims to highlight the importance of accelerating and streamlining the extraction andsynthesis of scientific evidence, specifically within the systematic review process. To do so, this paperdescribes the COKE (COVID-19 Knowledge Extraction framework for next generation discovery science) Project, which involves the use of machine reading and deep learning to design and implementa semi-automated system that supports and enhances the systematic literature review and guidelinedrafting processes. Specifically, we propose a framework for aiding in the literature selection andnavigation process that employs natural language processing and clustering techniques for selectingand organizing the literature for human consultation, according to PICO (Population/Problem, Intervention, Comparison, and Outcome) elements. We show some preliminary results of the automaticclassification of sentences on a dataset of abstracts related to COVID-19.