JITE (Journal of Informatics and Telecommunication Engineering)

The World Health Organization (WHO) declared COVID-19 a global pandemic due to its rapid spread and infection of people worldwide. The emergence of COVID-19 vaccines has garnered both support and rejection from the public. Some people support the vaccines, while others remain cautious, even though the government provides them for free. The procurement of coronavirus vaccines has generated diverse opinions in society. COVID-19 vaccines have become a trending topic on social media, particularly on Twitter. This research aims to explore public opinions on the COVID-19 vaccine. The methods used in this study include data collection, text preprocessing, TF-IDF, multilayer perceptron algorithm


INTRODUCTION
Sentiment analysis is a method designed to determine whether the polarity of textual data, such as letters, sentences, or paragraphs, is positive, negative, or neutral (Fauzi, 2018) (Brahimi et al., 2021) (Bisher et al., 2022).Sentiment analysis can be used to determine public opinion on an issue based on textual data (Kamil & Hananto, 2023) (Al-Hashedi et al., 2022).Public discussions on healthcare facilities and government procedures on Twitter can be used as a gauge to determine sentiment, especially for the latest rumours such as COVID-19 (Hall et al., 2022) (Lavalle Garrido et al., 2022).Twitter is a social media platform (Swapnarekha et al., 2023) that allows users to interact with each other by sending messages called tweets, which have a character limit of 280 (Yeasmin et al., 2022).
COVID-19, also known as Corona Virus Disease 2019, spread rapidly and infected people all over the world, resulting in the World Health Organization (WHO) declaring it a global pandemic in March 2020 (Sarirete, 2021) (Su et al., 2020) (Singh et al., 2022).After approximately 11 months, a COVID-19 vaccine with an efficacy rate above 90% was finally discovered and ready for use (ÇILGIN et al., 2023).According to WHO, by November 2020, more than 200 coronavirus vaccines had been discovered or developed, with around 40 of them having the potential to enter clinical trials (Qorib et al., 2023) (Niu et al., 2022).At least 7 vaccines are currently circulating after passing the final stage or phase III of clinical trials, namely Moderna, Pfizer/BioNTech, Sinovac, Sputnik V, Sinopharm, Covaxin, and Oxford/AstraZeneca (Khalid et al., 2022) (Park & Suh, 2023), The emergence of the COVID-19 vaccine has sparked both support and opposition in society.Some people support the vaccine while others are cautious, and even though the government provides the vaccine for free, there are still people who reject it (Hidayat et al., 2022) (Ingkafi et al., 2023).
The purpose of this study is to examine public sentiment towards the COVID-19 vaccine by combining the Multilayer Perceptron (MLP) and Term Frequency Inverse Document Frequency (TF-IDF) algorithms.The MLP algorithm is a component of the Artificial Neural Network (ANN) architecture with low complexity yet capable of producing accurate results (Khan et al., 2022).The study will focus on analyzing Twitter data obtained from kaggle.com.The researcher will conduct several text preprocessing steps, such as tokenization, normalization, removal of emoticons, stopwords, lemmatization, and TF-IDF.The algorithm that will be used in this study is the Multilayer Perceptron (MLP).The research gap of this study focuses on analyzing the sentiment analysis of the Indonesian society towards the COVID-19 vaccine using a combination of TF-IDF and Multilayer Perceptron (MLP) specifically consisting of 1 hidden layer and 61 neurons, using the Sigmoid activation function and a learning rate of 0.2.Several research questions addressed in this study are as follows: 1) What percentage of Indonesian society holds a positive/negative/neutral opinion towards the COVID-19 vaccine? 2) What percentage represents the accuracy level of sentiment analysis results from Twitter users regarding the COVID-19 vaccine in Indonesia?3) What are the key factors influencing the sentiment analysis of society towards the COVID-19 vaccine (Word Cloud)?The main findings of this research are primarily due to the existence of pros and cons among society regarding the COVID-19 vaccine during the pandemic, particularly the requirement of vaccination for individuals who intend to travel within the country or abroad using specific means of transportation.
Previous studies related to this research include: Zahra Bokaee Nezhad et al conducted a study on the sentiment analysis of the Iranian community towards two types of vaccines -Home Grown Vaccine (COVIran) and Imported Vaccines (Pfizer/BioNTech, AstraZeneca/Oxford, Moderna, and Sinopharm).The study employed a deep learning algorithm with CNN-LSTM architecture.The findings indicated that there was no significant difference between positive and negative opinions among the Iranian community.For the Home Grown Vaccine, the percentage of positive, negative, and neutral opinions were 40%, 40%, and 20%, respectively, while for the Imported Vaccine, the percentages were 43%, 45%, and 12% in that order (Bokaee Nezhad & Deihimi, 2022).Angga Pratama et al researched the sentiment analysis of the Indonesian community towards the Booster Vaccine (Phase 3) related to the annual Eid al-Fitr holiday in 2022.The study utilized several classification algorithms, including Naïve Bayes, Support Vector Machine (SVM), Decision Tree, Logistic Regression, Random Forest, K-Nearest Neighbor, AdaBoost, and XGBoost.The findings indicated that the best accuracy was achieved by the SVM algorithm with an accuracy of 88%.The research also revealed that the community's sentiment towards the Booster Vaccine was 37.63% for negative sentiment and 62.37% for positive sentiment (Pratama et al., 2023).
Charlyn Villavicencio et al researched the sentiment analysis of the Filipino community towards the COVID-19 vaccine using the Naïve Bayes method.The findings indicated that the Naïve Bayes algorithm achieved an accuracy level of 81.77%, with 83% of opinions classified as positive, 8% as negative, and 9% as neutral (Villavicencio et al., 2021).
Praveen SV et al researched the sentiment analysis of the Indian community towards the mandatory COVID-19 vaccination for the entire population of India.The study utilized data analysis and text analysis on information from social media.The findings indicated that the community had varying opinions, with 47% expressing a neutral opinion, 17% expressing a negative opinion, and 35% expressing a positive opinion (Praveen et al., 2021).
Pristiyono et al researched the sentiment analysis of the Indonesian community towards the COVID-19 vaccine using the Naïve Bayes algorithm.The findings indicated that 56% of the community expressed a negative opinion, 39% expressed a positive opinion, and 1% expressed a neutral opinion (Pristiyono et al., 2021).
Deden Ade Nurdeni et al researched the sentiment analysis of the Indonesian community towards the COVID-19 vaccine, specifically Sinovac and Pfizer, using the SVM method.The findings indicated that for the Sinovac vaccine, 77% of the community expressed a positive opinion, 19% expressed a negative opinion, and 4% expressed a neutral opinion.For the Pfizer vaccine, 81% of the community expressed a positive opinion, 17% expressed a negative opinion, and 3% expressed a neutral opinion.The accuracy level of the SVM algorithm was 85% for the Sinovac vaccine and 78% for the Pfizer vaccine (Nurdeni et al., 2021) .

A. Term Frequency-Inverse Document Frequency (TF-IDF)
Term Frequency-Inverse Document Frequency (TF-IDF) weighting is a feature extraction process by assigning a weight value to each word contained in a document, or it can be said to convert words into numbers.The benefit of this step is to estimate the importance of a sentence in a document.IDF is formulated as follows (ER & Yılmaz, 2023) (Widodo et al., 2022): After that, estimating TF-IDF by combining the TF calculation with IDF as follows (Widodo et al., 2022): (2)

B. Multilayer Perceptron
The Multilayer Perceptron is a part of ANN (Artificial Neural Network) that is derived from Perceptron (Asian et al., 2022).It is a feedforward ANN with one or more hidden layers.Usually, the network consists of an input layer, at least one layer of computation neurons in the middle (hidden), and a layer of computation neurons for output.The following equation is the algorithm for the Multilayer Perceptron (Nuanmeesri et al., 2022) : 1. Initialize weights with small random numbers 2. If the termination condition is not met, perform steps 2 to 8 3.For each pair of training data, carry out steps 3 to 8 4. Each input unit receives a signal and forwards it to the hidden unit above it.
5. Compute all the outputs in the hidden units   ( = 1, 2, … , ) (5) 6. Calculate all the network outputs at the output unit   ( = 1,2, … , ) Calculate factor δ in the output unit based on the error in each output unit   = ( = 1,2, … ) δk represents the error unit that will be utilized to correct the weights of the layer below it.
Calculate the weight change of wkj using the learning rate α.
Calculate factor δ in the hidden layer based on the errors in each hidden unit.  ( = 1) Factor δ in Hidden Unit : Calculate the term change of vji: Calculate all weight changes.The weight modification that will lead to the output unit are: The weight modification directed towards the hidden unit, namely:

III. RESEARCH METHODOLOGY
At the end of 2020, the WHO and several countries successfully developed COVID-19 vaccines that had passed clinical trials.However, only a few vaccines such as Moderna, Pfizer, Sinovac, and AstraZeneca are approved for use in Indonesia.The mandatory vaccination policy for Phase 1, Phase 2, and Booster COVID-19 vaccines often triggers controversy among the public.This controversy frequently arises when people need to travel to other cities or countries using various transportation methods such as buses, trains, and aeroplanes.
The required data for analyzing user sentiment in tweets is obtained from the website https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets,with a total data of 228,208.The data is taken from December 2020 to November 2021.The classification of opinions will be divided into three categories: positive, negative, and neutral.The evaluation of the test will use accuracy, precision, and recall.The testing will be conducted using a proportion of 70% training data and 30% testing data, 80% training data and 20% testing data, as well as 90% training data and 10% testing data.
In this study, the researcher opted to utilize a single hidden layer due to previous research experiments indicating that the use of three hidden layers resulted in a decrease in accuracy.As for the number of neurons, the researcher selected 61 neurons as a departure from the previous study, which used multiples of 32, 64, 128, and 256 neurons.By exploring a non-multiple of 32 neurons, the researcher aimed to conduct further testing.The learning rate employed was 0.2, with a relatively small alpha value chosen in the hopes of enhancing accuracy performance.The activation function employed was the binary sigmoid activation function The following are the research stages that will be carried out:

A. Research Result
The testing involved the use of three Multilayer Perceptron (MLP) models with the same parameters, such as 61 hidden layer neurons, sigmoid activation function, and an alpha value of 0.2.The only difference was in the proportion of training and testing data.Table 1 shows that the training data set was larger than the testing data set.This was done to train the machine learning model, and the more data it was trained on, the better it could recognize patterns through high accuracy, precision, and recall.On the other hand, the testing data set was used to evaluate the model's performance after the training process was completed.This study aimed to find the best performance in terms of accurately classifying the sentiment of tweets related to the COVID-19 vaccine using three models with different data proportions.Based on Table 1, three MLP models were used with the same parameters but different data proportions.Performance results are measured using accuracy, precision, and recall (Braig et al., 2023) (Ingkafi et al., 2023) Since this study only focused on the data proportions, Table 2 and Figure 2 show the results obtained are :

B. Analysis of Test Results
Based on the process carried out in this study, which involved three rounds of testing with different training and testing data proportions, the MLP1 model achieved a training performance of 80.5% accuracy, 84.8% precision, and 69.3% recall in the first test.The testing performance resulted in 80% accuracy, 84.3% precision, and 68.7% recall.
In the second round of testing, the MLP2 model achieved a training performance with an 80.7% accuracy rate, 84.6% precision rate, and 69.4% recall rate.Meanwhile, the testing performance resulted in an 80.2% accuracy rate, 84.3% precision rate, and 69% recall rate.
The third test resulted in the MLP3 model achieving a training performance of 81.6% accuracy, 84.2% precision, and 71.6% recall.The testing performance resulted in 81.2% accuracy, 83.8% precision, and 71.2% recall.In conclusion, the MLP3 model has the highest accuracy compared to MLP1 and MLP2 models.Additionally, the training and testing performance of MLP3 did not result in overfitting, where the training performance was significantly higher than the testing performance.

C. Discussion
In Indonesia, COVID-19 vaccinations are mandatory in certain situations such as weddings, large religious celebrations, intercity or international travel, and performing Hajj or other religious rituals.The findings of this research indicate that public opinion regarding COVID-19 vaccination varies, with 35% holding a positive opinion, 16.3% holding a negative opinion, and 48.7% holding a neutral opinion.This is because people tend to respond more positively than negatively to the government's COVID-19 vaccination discourse in Indonesia.Additionally, a neutral opinion means that the public's opinion obtained does not only consist of those who express their support or opposition to vaccination, but also includes many other responses such as knowledge, expectations, or general opinions.The percentages of positive, negative, and neutral opinions towards COVID-19 vaccination among the public can be seen in Figure 3 below: The outcomes of the Wordcloud analysis (Sugumaran & Uma, 2022) in this research are illustrated in the following Figure 4. Wordcloud (a) displays the frequently used words in the positive sentiment classification such as "free," "dose," "availability," and "paid".Wordcloud (b) presents the frequently used words in the negative sentiment classification such as "vaccine," "year," "emergency," and "effect".Wordcloud (c) exhibits the frequently used words in the neutral sentiment classification such as "age," "dose," "slot," and "date".The positive, negative, and neutral word associations from the word cloud in Figure 4 can be observed in Table 3.The limitation of this study is the utilization of a single type of artificial neural network algorithm, specifically the Perceptron, with a binary sigmoid activation function.The evaluation of sentiment analysis was solely conducted using the confusion matrix method and word cloud.The research data only covered one year and was sourced exclusively from the social media platform Twitter.The findings of this study indicate that a significant portion of the Indonesian population responded positively to the government's policy regarding mandatory COVID-19 vaccination.Similar results were observed in a study conducted by Angga Pratama et al., which revealed a positive response from the Indonesian community towards COVID-19 vaccines, particularly during the Eid al-Fitr holiday, demonstrating increased concern for their health (Pratama et al., 2023).The contribution of this research is evident in the utilization of three perceptron models, referred to as MLP 1, MLP 2, and MLP 3, each with varying proportions of training and testing data.Upon observing Table 2 and Figure 2, the differences in accuracy, precision, and recall between MLP 1, MLP 2, and MLP 3 are not significantly large.This implies that despite variations in the proportions of training and testing data, these differences do not heavily impact accuracy, precision, and recall.The implications of this research highlight the importance of analyzing the sentiment of the public towards the government's COVID-19 vaccine policy and the perceived side effects of the vaccine, as evident from the word cloud results in Table 3, where negative sentiment predominantly relates to perceived adverse effects of the COVID-19 vaccine.

V. CONCLUSION
The conclusion drawn from this study is that the implementation of the Multilayer Perceptron algorithm and accuracy level of user sentiment classification towards COVID-19 vaccine tweets obtained the best accuracy of 81.2s% in the MLP3 model, precision of 83.8%, and recall of 71.2%.The visualization of the most frequent words in positive sentiment reveals 3 topics related to "availability", "paid", and "dose".Furthermore, negative sentiment consists of 2 main issues such as "vaccine side effects" and "death".Lastly, neutral sentiment covers 4 topics such as "dose", "availability", "age", and "expiration date".
Figure 1.Research Stages

Figure 3 .
Figure 3.The Percentage of Positive, Negative, and Neutral Opinions among The Public Regarding The COVID-19 Vaccine

Table 1 .
Testing and Parameters

Table 2 .
Performance Results

Table 3 .
Association of Positive, Negative, and Neutral Words