Analysis of Long COVID Phenotypes and their Impact on Mental Health and Daily Functioning: Insights from Twitter

Marko Markovikj1, Jovana Dobreva1*, Mary Lucas2, Irena Vodenska3, Lou Chitkushev2, Dimitar Trajanov1,2

1Faculty of Computer Science & Engineering, Ss. Cyril and Methodius University, Rudzer Boshkovikj 16, 1000 Skopje, Macedonia

2Computer Sciences Department, Metropolitan College, Boston University – Boston, MA, US

3Administrative Sciences Department, Metropolitan College, Boston University – Boston, MA, US

jovana.dobreva [at] finki.ukim.mk

Abstract

In this study, we conducted an investigation into Long COVID from a user perspective, utilizing Twitter social media data. Prior to analysis, the data underwent preprocessing to obtain raw text per tweet. Our analysis commenced with basic statistical analysis and subsequently expanded to identify characteristic periods for the phenotypes based on dynamic timelines. We also explored the relationships between the phenotypes, as well as the interdependence between phenotypes and geolocation.

In the context of this research, an analysis was conducted on a collection of tweets that encompassed the timeframe from March 2020 to March 2022. The dataset consisted of approximately 1.9 million tweets. In order to concentrate on word phrases, extraneous elements such as mentions, emoticons, links, and hashtags were eliminated. Subsequently, a process of lemmatization was performed. For the purpose of reducing the number of distinct phenotypes under investigation and facilitating the presentation of results, the collected data was categorized into five overarching groups: Cardiovascular, Respiratory, Daily Living, Neurological and Mental Health, and Other.

The statistical data regarding the most commonly used words by individuals describing their experiences during the Long COVID period are as follows: “Ampicillin” was tweeted 125,295 times, “Death” was tweeted 121,156 times, “Suffer” was tweeted 125,113 times, and “Vaccine” was tweeted 108,968 times. We observe distinct patterns in the emergence of certain phenotypes during this period, particularly in relation to the quality of life. On August 1, 2020, the term “quality of life” was mentioned in only 223 tweets, whereas one year later, during the same month, this phenotype garnered 1,663 tweets.

Our findings reveal that the occurrence of Long COVID phenotypes is influenced by both temporal and geographical factors. The analysis shows a clear and notable trend within the dataset. Specifically, it is observed that neurological symptoms, along with symptoms that impede individuals’ daily functioning, exhibit the highest prevalence, particularly during the latter half of the analyzed tweet period. This period corresponds to a time when an increasing number of individuals have recovered from COVID-19 and are reporting their experiences with Long COVID. Notably, fatigue, depression, stress, and anxiety emerge as the most prevalent phenotypes.

This scientific investigation of the complex interactions between Long COVID phenotypes, mental health, and the manifestation of diverse symptoms is offering insights into the profound consequences on individuals’ lives. These findings shed light on the significant burden posed by Long COVID and its cascading effects on various aspects of individuals’ well-being and society at large.

Keywords: Long COVID, data mining, computer science, nlp

Comments are closed.