An analysis of Yemenis’ responses and sentiments on social media towards the emergence of the COVID-19 pandemic

: Recently, many studies have widely dealt with data mining and Text classification, including sentiment analysis. Sentiment analysis (SA) is an application of Natural Language Processing (NLP) implemented to understand the public’s attitudes. The recent proliferation of social media has helped gauge the public’s mood. The current study aims to explore the influence of the COVID-19 pandemic on the Yemeni community and generate indices assessing public sentiments and attitudes using lexicon and rule-based approach (VAEDR: Valence Aware Dictionary and Sentiment Reasoner) and qualitative and quantitative analysis methods. 8,830 Facebook and YouTube comments were analyzed before and after the declaration of COVID-19 on 10 th April 2020 in Yemen. The results revealed that sentiment polarity with and without contextual reference differed significantly. Without contextual reference, neutrality was prevalent and reached 55%; negativity scored 24% while positivity reached 21% before 10 th April, but after this date, negativity was dominant and reached 57%, neutrality scored 28%, and positivity scored 15%. With contextual reference, positivity was prevalent and scored 72% before 10 th April, but after this date, negativity dominated the public’s mood and reached 78.23%; positivity highly decreased to 18.65%, while neutrality scored 3.12%. The study demonstrated the superiority of SA based on the contextual reference of words.


Introduction
COVID-19 emerged as an infectious disease and created a global crisis that dramatically affected the world in different sectors like education [1,2], health, economy, etc.Recently, social media data has become an important resource and influential platform for open information sharing, sentiment expressions, reviews, and opinions polling.It is emerged as an available and easily accessible data source for researchers, especially for survey and classification studies.Researchers face many difficulties in traditional surveys, either paper surveys or online surveys because they rely on the participants' collaboration and bring extra burdens for them [3,4].In contrast, social media surveillance can systematically monitor public emotions and reactions to epidemical events in real-time [5,6].Furthermore, it is relatively less likely to be affected by recall or reporting bias [7,8].
Moreover, it is difficult to ask people to recall their previous mental states, so social media platforms data become the best choice for recalling previously documented mental states as they track their sentiments, reactions, concerns, and risk perceptions as they emerge and evolve in realtime.Research on people's feelings is essential for monitoring and keeping mental health [9].Social media has been analyzed using content analysis [10][11][12].Knowing and assessing the public's attitudes is integral in providing baseline information about the disease and the situation in general.That assists in releasing the public's panic and concerns which could further complicate the situation.In the beginning, when the first confirmed COVID-19 case was recorded in Wuhan on 31st December 2019, the epicenter of this epidemic, people across the world showed no panic because they were unaware and did not have the proper knowledge and information about this novel virus and its catastrophic consequences.Later, a wave of fear and worry sentiments was provoked around the globe because of the increase and severity of cases worldwide and the scene of its causalities in social and mass media, which has played a significant role in disseminating health information and expressing people's feelings and attitudes towards Covid-19 [13,14].People in Yemen started dealing with Covid-19 news late in February.Many studies have been conducted to explore public sentiments and attitudes toward the emergence of COVID-19; some are based on traditional surveys [15], some investigations applied lexicon-based approaches [16], and others used machine-learning-based methods [9,[17][18][19][20][21][22] and many other studies.Lexicon-based sentiment analysis tries to computationally determine the polarity of the public's attitudes based on the semantic indication of words and phrases.
In contrast, machine-learning-based sentiment analysis tries to determine the public's orientation based on developing models from the annotated training dataset (structured data) and sometimes from unstructured data.In both methods, sometimes the results can show reasonable accuracy.The main reason for failure is the different semantic orientations of words or phrases depending on their contextual reference.However, this investigation tackles the problematic issue of semantic ambiguity by considering the linguistic contextual reference and its emotional indications using the qualitative and quantitative data analysis approach for better results.This study aims to computationally explore and categorize the Yemeni community's reactions, attitudes, and sentiments towards COVID-19 expressed on Facebook and YouTube platforms because Yemenis widely use them.This exploration comprises two phases; before and after the emergence of the 1 st confirmed case of COVID-19 in the Yemeni Southeastern province of Hadramawt on 10th April 2020.It applies lexicon and rule-based approach using VAEDR, which can achieve an F1 of 0.96 in classifying comment sentiment, which is higher than many machine learning and deep learning models; and even more accurate than human raters (F1 = 0.84).It also applies statistical (qualitative and quantitative) methods using QDA software that provides comments annotation with sentiment assessing metrics for those comments.And it consequently allows for a systematic qualitative and quantitative content analysis.
Indeed, this study gives us a clear picture of the Yemenis' reactions during this epidemical phase which can help the decision-makers to provide accurate information to ease the public's concerns and panic.Moreover, it helps the government better prepare to address future health crises involving infectious diseases.The rest of this paper is organized as follows.Section 2 introduces the related work, and section 3 discusses the data collection and methodology.Section 4 presents simulation results.Section 5 discusses the results, and section 6 concludes the study and provides future work.

Related Works
Many studies have been worked on Sentiment Analysis on social media.[20], detected the public sentiment towards the pandemic using Coronavirus particular Tweets and R statistical software with its sentiment analysis packages.They monitored the progress of fear sentiment over time in the United States, applying descriptive textual analytics supported by the required textual data visualizations.The impact of using social media during this pandemic was also studied in work presented by [23], which proposed a causal inference approach to realize and measure the causal relationships between pandemic properties, such as infections and deaths, and public sentiment on Twitter.They have used different machine learning classifiers as Random Forests, Decision Tree, MaxEntropy, Naïve Bayes, LogitBoost and SVM, to investigate the Influence of Coronavirus in the world from their tweets.Their experiment revealed that the LogitBoost classifier achieves the most exceptional accuracy.[24] conducted a sentiment analysis to recognize the general attitudes of the Public on Twitter.They introduced a statistical analysis of the Twitter messages on COVID-19 posted since January 2020.A power-law distribution has modeled unigram, bigram, and trigram frequencies.They validate the results by Root Mean Square Error (RMSE), Sum of Square Error (SSE), and R2, which lay the grounds for the goodness of fit of this model.[25] has extracted Twitter data from Twitter social media through python programming language using the Tweepy library, and the sentiment analysis operation has been done using the TextBlob library in python.Studying the public's emotions towards the COVID-19 pandemic in social media was discussed in [26,27], which explained the impact of the pandemic on emotional health.They used social media and the ecosystem as a source of data to explore the impact of the pandemic on the public's sentiments, such as sadness, anxiety, happiness, etc.Dr. Jeehaan Algaraady

Materials and Methods
The first step in this study is the construction of our data.The obtained data includes (N= 8,830) comments distributed to (n = 4,958) public's comments on Facebook official pages posts and (n = 3,872) comments on YouTube official channels videos before and after the declaration of the 1 st confirmed COVID-19 case in Yemen on 10 th April 2020 (1)' (2) .It contains (n = 5,427) comments from (1 st February 2020 -9 th April 2020) before recording the first case of COVID-19 in Yemen and (n = 3,403) comments after recording the first case on 10 th April 2020.The absolute numbers in table one represent the examined datasets.The datasets are collected in the form of texts.Before processing the collected data, the author needs to clean the data by removing the so-called "stop" list and all those irrelevant words "superfluous words" that have no value for the analysis, such as users' names, posters, digits, abbreviations, profiles, and timing.Then, each Arabic word type is given its synonym in English to be assigned to its sentiment score in VADER (Valence Aware Dictionary and Sentiment Reasoner).This predefined lexicon comprises 7517 words and emoticons.This model analyzes the comments' sentiments that are sensitive to polarity (positive and negative) and intensity of emotions.VADER sentiment analysis applies a human-based approach that combines qualitative analysis and empirical validation by human raters [29][30][31].The author attempts to find the overall sentiment polarity with and without word linguistic and contextual reference (five words before and five words after) with its emotional indications to know in which contexts these Afterward, the elicited key phrases were searched to conveniently assign the categories with the comments, which have been revised manually.Consequently, the assigned categories have been used as codes in the QDA software (Qualitative Data Analysis software).Qualitative Data Analysis is a lexicon approach that offers a data annotation with assessing metrics for opinion mining and allows performing a systematic qualitative and quantitative content analysis [32], as shown in figure one.

Word Frequency and Sentiment Analysis Polarity
Table two below represents the most frequent top positive and negative words in the corpus related to the word 'Corona' in isolation before and after the declaration of coronavirus in Yemen.Before, the words ‫هللا‬ "Allah," Allah ‫الحمد‬ "praise," ‫ثقد‬ "trust," and ‫رحمد‬ "mercy" were the most frequent positive words that occurred 6904, 1310, 989, and 652 times, with average weights according to VADER dictionary 1.1, 2.6, 2.3, 1.5, respectively.While the most frequent related negative words are ‫مدض‬ "illness," ‫كارثد‬ "disaster," ‫سدء‬ "bad," with average weights of -2.2 and -3.1, and -2.5, respectively.In contrast, table three below represents the most frequent top positive and negative words in the corpus related to the word 'Corona" after the emergence of COVID-19 in Yemen.It is seen that the positive words remarkably decreased.For example, the word ‫هللا‬ "Allah" appeared 2109 times with a Vader weight of 1.1.Then the words ‫وء‬ ‫يد‬ "pray" and ‫رحمد‬ "mercy" occurred 832 and 756 times, and their average weights, according to VADER, are 1.3 and 1.5.While the most frequent negative related words are ‫دض‬ ‫مد‬ "illness," which appeared 3482 times, then ‫د‬ ‫كارثد‬ "disaster" occurred 2105 times, and ‫يقتد‬ "kill" occurred 1454 times with VADER weights = -2.displays the computational evaluation of the public's emotional traits frequency before the declaration of Covid-19 in Yemen (February, March, and 1 st -10 th April 2020).The researcher monitored a notable increase in the positive tone and, consequently, a decrease in the negative tone in this period.Of the total number of comments (n= 2,046), contentment sentiment scored the highest frequency of the annotated comments on Facebook and YouTube during the three months with a score of 36.91%,46.30%, and 35.94%, respectively, while happiness scored the lowest values with 14.06%, 11.60%, and 7.81%, respectively.In contrast, the negative sentiment "anxiety" increased gradually and achieved the highest values of 19.72%, 11.07%, and 31.25%,respectively, while carelessness showed the lowest scores with 3.50%, 1.34%, and

public's sentiments after the declaration of the 1st case of COVID-19
Figure three represents an overall proportion of the sentiment categories after recording the first confirmed case of COVID-19 in Yemen.The researcher monitored a remarkable and high rise in the negative tone of comments.Anxiety was the most frequent sentiment and accounted for 36.97%, then indignation with a 27.62% score, while sadness scored less with 6.37%.Conversely, the percentage of positive comments significantly decreased.The contentment sentiment reached its lowest rate of 9.30%, and optimism and happiness scores decreased to 8.26% and 1.09% of the comments, respectively.Before COVID-19, neutral was the most frequent sentiment and scored 55%, then the negative sentiment with a score of 24%, while the positive sentiment showed the lowest percentile of 21%.In contrast, after COVID-19, the negative sentiment scored the highest value.It accounted for 57% of the comments, followed by the neutral sentiment accounting for 28%, and then the positive sentiment, which scored the lowest value and accounted for 15% of the comments.Dr. Jeehaan Algaraady frequent attribute, of which value reached 72%, then negative with a relative frequency of 27%, and neutral was the lowest frequent attribute reached 1.0% before COVID-19.While after the declaration of COVID-19, 78.23% of the public showed negative sentiment, 18.65%; showed a positive tone, and 3.12% were neutral.

A comparison of detailed comments categories coverage between Facebook and YouTube Table four below represents the comments' documents in the columns
and their assigned emotional traits in the rows.The percentages in the cells indicate the extent of the comments coded with the assigned traits.The color highlighting in the rows indicates the value degrees, the dark green indicates the highest values, and the white means the lowest values.The closer a value is to the highest one, the darker its green highlights.Based on rows highlighting, before the emergence of COVID-19 in Yemen, the positive emotions "contentment" and "optimism" were the most frequent sentiments on both Facebook (FB) and YouTube (YT) during February, March, and April and reached their highest score of 49.5% and 40.1%, respectively, during March.In contrast, there was a notable decrease in these two sentiments during (the 1 st -10 th ) of April, and an increase in anxiety score reached 41.5%.In contrast, after the declaration of COVID-19 on 10 th April, indignation and anxiety were the most frequent public emotions, where anxiety was the most frequent sentiment monitored on FB and scored 32.0%, and indignation was the most frequent attribute and reached 35.9% in YT.While the positive tone highly decreased during this phase as the "contentment and optimism" highest scores were 14.4% and 8.0% in YT, respectively.

Discussion
The author attempts to identify the attitudes and reactions of the Yemeni community towards the emergence of the novel Coronavirus after 1st February 2020 because people before this period did not show valuable mentioned responses.The public's reactions give us an idea about how their sentiments altered due to the spread of COVID-19.The most stickling result to emerge from the analysis was the disparity in the overall sentiment polarity when it was based on the frequency of the word with and without contextual reference.When we ignored the word context, we noticed that most of the Yemeni community showed neutral sentiment and no reaction towards In contrast, when we consider the context of the words, positivity was the dominant tone during (February -9 th April) while negativity was the prevailing tone after this period, and very few comments were neutral.These results stressed the important role of contextual reference in identifying the word's actual sentiment rather than investigating words in isolation.Based on the public's emotional indicators, the findings revealed that during (the 1 st February -9 th April 2020) period, most Yemenis expressed a high degree of contentment and optimism.And this is normal because, during this period, Yemen did not detect any cases of COVID-19 and is considered approximately the only Arabic country not infected by the virus after more than five months of this unprecedented virus worldwide.Indeed, in emergencies such as stress or death, people tend to respond religiously to comfort in tense moods and bring positive emotions [33].Faith plays a crucial role in believers' perceptions of any affliction and crisis [34].Yemen is considered a religious society, which explains why most comments contain excessive supplications and, thus, the positive tone is monitored, especially in the first phase.In general, though there is a clear tone of negativity, the positivity sentiment stands out, and this is inconsistent with [35].Anxiety and indignation gradually increased during the ten days before the declaration of the first case of COVID-19.They reached the highest value after the declaration of COVID-19%, which indicates people feel the risk of the transmission of COVID-19 to Yemen, which was declared later on 10 th April 2020.The contentment, optimism, and happiness were decreased, so people widely showed their concerns about the tragic consequences of COVID-19 and isolation requirements.Especially the country is still struggling with the war and its effects and is not ready for more crises.Moreover, people were worried about the psychological consequences of quarantine and sad about the isolation and how to combat the feeling of exclusion; this analysis gives us insight into how people think and react differently during the ongoing crisis.Experimental treatment of the obtained data revealed a state of overwhelmingly negative sentiment and an overview of the apprehension of the pandemic in Yemen after the affirmation of the

Conclusions
Intuitively, the emergence of infectious diseases may arouse public attitudes constructively or disruptively.And they mainly post their feelings on social media.This study explores real-time Yemenis' reactions and sentiments on social media towards the emergence of COVID-19 before and after the declaration of the 1 st COVID-19 case on 10 th April 2020 in the southeastern Yemeni province of Hadramat.This study is considered the first to investigate the Yemeni's attitudes towards this pandemic and utilizes a mixed method of Statistics and lexicon-based sentiment analysis.It applies a lexicon and rule-based approach using VAEDR for assessing the sentiment of the comments and the qualitative and quantitative data analysis approach for investigating the sentiment fine-grade indices.This analysis tackles the problematic issue of semantic ambiguity by taking the words' linguistic context (with five words before and five words after) and their emotional indications into sentiment polarity.The sentiment polarity without contextual reference was different from that based on the word contextual reference.In the former case, the neutral sentiment was the dominant sentiment before 10th April, but the negative was the dominant tone after this date.In the latter case, the analysis revealed an overall positive sentiment state before 10th April, where contentment and optimism sentiments were dominated, while an overwhelmingly negative sentiment stood out with a high degree of anxiety and indignation after this date.Indeed, knowing and assessing the public's attitudes play an integral role in providing baseline information about the disease and the situation in general.That assists in releasing the public's panic and concerns which could further complicate the situation and help the government to be better prepared to address future health crises involving infectious diseases.

Data Availability Statement:
The data is available on request.

[ 28 ]
applied different machine learning classifiers: K-nearest neighbors (KNN), SVM-based Radial Basis Function (RBF), and Bernoulli NB models to analyze the sentiments of 5986 Arabic YouTube comments and classify them into positive and negative sentiments.The SVM-RBF model scored the highest accuracy of 88.8% applied with normalized data.In conclusion, sentiment analysis and data mining methods are valuable in analyzing big data related to public attitudes and reactions.‫ـة‬ ‫ـاني‬ ‫اإلنس‬ ‫ـات‬ ‫اس‬ ‫والدر‬ ‫ـوية‬ ‫ب‬ ‫ر‬ ‫الت‬ ‫ـوم‬ ‫العل‬ ‫ـة‬ ‫مجل‬ ‫ـدد‬ ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ـع‬ ‫ـ‬ ‫ال‬ ( Yemenis' responses and sentiments….

Figure ( 3 ).
Figure (3).Overall emotional indicator of Yemenis' comments after the declaration of COVID-19 in Yemen 4.2.3Overall sentiment Figure four shows the overall sentiment analysis of the comments before and after COVID-19 based on the words without contextual reference.Before COVID-19, neutral was the most frequent sentiment and scored 55%, then the negative sentiment with a score of 24%, while the positive sentiment showed the lowest percentile of 21%.In contrast, after COVID-19, the negative sentiment scored the highest value.It accounted for 57% of the comments, followed by the neutral sentiment accounting for 28%, and then the positive sentiment, which scored the lowest value and accounted for 15% of the comments.

Figure ( 4 ).
Figure (4).Sentiment analysis polarity without contextual reference Figure five below displays the overall sentiment analysis of the comments before and after COVID-19 based on the word contextual reference (five words before and five words after); and the computation of their sentiment attributes.Overall, the positive sentiment was the most COVID-19 during (February -10th April).While after the ‫الت‬ ‫ـوم‬ ‫العل‬ ‫ـة‬ ‫مجل‬ ‫ـدد‬ ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ـ‬ ‫ـع‬ ‫ـ‬ ‫ال‬ ( affirmation of COVID-19 in Yemen, the negative sentiment about the COVID-19 pandemic was the dominant polarity in comments.
first case of COVID-19 on 10 th April.It disclosed how the pandemic evolved the public's concerns and reactions.

Table ( 1
).The absolute number of Facebook and YouTube comments

Table ( 2
). Top frequent polarity words with their VADER assigned weights.

Table ( 3
). Top frequent polarity words with their VADER-assigned weights

1 Public sentiments before the declaration of the 1 st case of COVID- 19 for three months "February, March, and (1 st -10 th
) April." Figure two below

Table ( 4
).A detailed comparison of comments on emotional traits coverage between Facebook and YouTube before and after the emergence of COVID-19 in Yemen.