Machine learning approach in inferring main population-level COVID-19 risk factors

Sofija Marković1*, Anđela Rodić1, Ognjen Milićević2, Igor Salom3, Magdalena Đorđević3, and Marko Đorđević1

1Faculty of Biology, University of Belgrade, Studentski trg 6, 11000 Belgrade, Serbia

2School of Medicine, University of Belgrade, dr Subotića starijeg 8, 11000 Belgrade, Serbia

3Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11000 Belgrade, Serbia

sofija.markovic [at] bio.bg.ac.rs

Abstract

Machine-learning methods have become indispensable in scientific research as the amount of available data has grown exponentially in recent years. It is, thus, necessary to employ various unsupervised and supervised machine learning methods to uncover the main determinants of COVID-19 transmissibility and severity in the population. Upon introducing appropriate disease transmissibility and severity measures and gathering relevant socio-demographic, environmental, and health-related data for the countries with obtained said measures, we implement several machine-learning-based approaches to select the most prominent drivers of disease transmissibility and severity. These approaches include regularization-based linear regression models and more advanced Random Forest and Gradient Boost methods, which are not limited to the linear relationships between the features and the response. Principal component analysis was used for preselection to avoid overfitting, where numerous features were considered for a relatively small number of observations (i.e., countries/states). As a result, a broad range of potential COVID-19 risk factors was reduced to several prominent features, selected robustly by different methods – we further untangle how they, directly or indirectly, contribute to the transmissibility and severity of the disease. Our results underscore the evolving nature of COVID-19, from the severity experienced during the first wave to the emergence of new, highly transmissible variants like Omicron. These insights can guide public health interventions, vaccine strategies, and policies aimed at reducing the burden of COVID-19 and effectively managing future waves and emerging variants.

Keywords: COVID-19, machine learning, ecological regression analysis, epidemiological modeling, outburst risk factors

Acknowledgment: This work is supported by the Ministry of Science, Technological Development, and Innovation of the Republic of Serbia.

Comments are closed.