Sentiment Analysis of Emoji and Latinized Arabic in Indonesian Youtube Comments: A LABERT-LSTM Model
DOI:
https://doi.org/10.37385/jaets.v6i2.7000Keywords:
Analisis Sentimen, Latinized Arabic, Emoji, BERT, Bi-LSTM, LABERT-LSTMAbstract
This study addresses the challenges of sentiment analysis on Indonesian-language YouTube comments, which are complex due to the use of dialects, slang words, emojis, and Latinized Arabic text. The proposed LABERT-LSTM model integrates BERT for deep feature extraction and Bi-LSTM to capture word sequence context effectively. The dataset comprises 24,593 YouTube comments from five renowned Islamic preachers discussing the topic of “tahlilan”. After data preprocessing, the model was evaluated using accuracy, precision, recall, and F1-score metrics. The results demonstrate that LABERT-LSTM achieved an accuracy of 0.95756, precision of 0.94014, recall of 0.91815, and an F1-score of 0.92868, outperforming standalone BERT and Bi-LSTM models by reducing misclassification and improving predictions for negative, positive, and neutral sentiment classes. Future research recommendations include expanding the dataset to other social media platforms, adopting advanced NLP techniques, conducting studies in other languages, and optimizing the model for enhanced performance and computational efficiency.
Downloads
References
Abid, F., Li, C., & Alam, M. (2020). Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks. Computer Communications, 157, 102–115. https://doi.org/10.1016/j.comcom.2020.04.002
Ahanin, Z., & Ismail, M. A. (2022). A multi-label emoji classification method using balanced pointwise mutual information-based feature selection. Computer Speech and Language, 73 , 101330. https://doi.org/10.1016/j.csl.2021.101330
Ainin, S., Feizollah, A., Anuar, N. B., & Abdullah, N. A. (2020). Sentiment analyses of multilingual tweets on halal tourism. Tourism Management Perspectives, 34, 100658.. https://doi.org/10.1016/j.tmp.2020.100658
Al-Azani, S., & El-Alfy, E. S. M. (2021). Early and Late Fusion of Emojis and Text to Enhance Opinion Mining. IEEE Access, 9, 121031–121045. https://doi.org/10.1109/ACCESS.2021.3108502
Ahda, F. A. I., Wibawa, A. P., Prasetya, D. D., & Sulistyo, D. A. (2024). Comparison of Adam Optimization and RMS prop in Minangkabau-Indonesian Bidirectional Translation with Neural Machine Translation. JOIV: International Journal on Informatics Visualization, 8(1), 231-238. http://dx.doi.org/10.62527/joiv.8.1.1818
Aribowo, A. S., Basiron, H., Yusof, N. F. A., & Khomsah, S. (2021). Cross-domain sentiment analysis model on indonesian youtube comment. International Journal of Advances in Intelligent Informatics, 7(1), 12–25. https://doi.org/10.26555/ijain.v7i1.554
Chakma, K., Ruba, U. B., & Riya, S. D. (2022). YouTube as an information source of floating agriculture: analysis of Bengali language contents quality and viewers’ interaction. Heliyon, 8(9). https://doi.org/10.1016/j.heliyon.2022.e10719
Chrismanto, A. R., Sari, A. K., & Suyanto, Y. (2023). Enhancing Spam Comment Detection on Social Media With Emoji Feature and Post-Comment Pairs Approach Using Ensemble Methods of Machine Learning. IEEE Access, 11, 80246–80265. https://doi.org/10.1109/ACCESS.2023.3299853
de Andrade, C. M., Belem, F. M., Cunha, W., França, C., Viegas, F., Rocha, L., & Gonçalves, M. A. (2023). On the class separability of contextual embeddings representations–or “the classifier does not matter when the (text) representation is so good!”. Information Processing & Management, 60(4), 103336. https://doi.org/10.1016/j.ipm.2023.103336
Elfajr, N. M., & Sarno, R. (2018, September). Sentiment analysis using weighted emoticons and SentiWordNet for Indonesian language. In 2018 International Seminar on Application for Technology of Information and Communication (pp. 234-238). IEEE.
Habbat, N., Nouri, H., Anoun, H., & Hassouni, L. (2023). Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning. Engineering Applications of Artificial Intelligence, 126, 106999. https://doi.org/10.1016/j.engappai.2023.106999
Haque, R., Islam, N., Tasneem, M., & Das, A. K. (2023). Multi-class sentiment classification on Bengali social media comments using machine learning. International Journal of Cognitive Computing in Engineering, 4, 21–35. https://doi.org/10.1016/j.ijcce.2023.01.001
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Jia, K., Meng, F., Liang, J., & Gong, P. (2023). Text sentiment analysis based on BERT-CBLBGA. Computers and Electrical Engineering, 112, 109019. https://doi.org/10.1016/j.compeleceng.2023.109019
Karmagatri, M., Utama, I. D., Yogie, H., Lemena, J., & Chen, H. H. (2023). Naive Bayes Sentiment Analysis on Perceptions of Halal Certification: A Case Study on Mixue Indonesia. Journal of Theoretical and Applied Information Technology, 101(17).
Karo, I. M. K., Fudzee, M. F. M., Kasim, S., & Ramli, A. A. (2022). Karonese sentiment analysis: a new dataset and preliminary result. JOIV: International Journal on Informatics Visualization, 6(2-2), 523-530. http://dx.doi.org/10.30630/joiv.6.2-2.1119
Kusumaningrum, R., Nisa, I. Z., Jayanto, R., Nawangsari, R. P., & Wibowo, A. (2023). Deep learning-based application for multilevel sentiment analysis of Indonesian hotel reviews. Heliyon, 9(6). https://doi.org/10.1016/j.heliyon.2023.e17147
Li, D., Rzepka, R., Ptaszynski, M., & Araki, K. (2020). HEMOS: A novel deep learning-based fine-grained humor detecting method for sentiment analysis of social media. Information Processing and Management, 57(6). https://doi.org/10.1016/j.ipm.2020.102290
Li, Z., & Zou, Z. (2024). Punctuation and lexicon aid representation: A hybrid model for short text sentiment analysis on social media platform. Journal of King Saud University - Computer and Information Sciences, 36(3). https://doi.org/10.1016/j.jksuci.2024.102010
Monesa, F., & Jayadi, R. (2023). Sentiment Analysis On Investment Topic In Indonesia Using Machine Learning Algorithms Approach. Journal of Theoretical and Applied Information Technology, 101(13).
Murfi, H., Theresia Gowandi, S., Ardaneswari, G., & Nurrohmah, S. (2024). BERT-based combination of convolutional and recurrent neural network for indonesian sentiment analysis. Applied Soft Computing, 151. https://doi.org/10.1016/j.asoc.2023.111112
Naseem, U., Razzak, I., Musial, K., & Imran, M. (2020). Transformer based Deep Intelligent Contextual Embedding for Twitter sentiment analysis. Future Generation Computer Systems, 113, 58–69. https://doi.org/10.1016/j.future.2020.06.050
Peng, D., & Zhao, H. (2021). Seq2Emoji: A hybrid sequence generation model for short text emoji prediction. Knowledge-Based Systems, 214. https://doi.org/10.1016/j.knosys.2020.106727
Peng, S., Cao, L., Zhou, Y., Ouyang, Z., Yang, A., Li, X., ... & Yu, S. (2022). A survey on deep learning for textual emotion analysis in social networks. Digital Communications and Networks, 8(5), 745-762. https://doi.org/10.1016/j.dcan.2021.10.003
Rahmanti, A. R., Chien, C. H., Nursetyo, A. A., Husnayain, A., Wiratama, B. S., Fuad, A., Yang, H. C., & Li, Y. C. J. (2022). Social media sentiment analysis to monitor the performance of vaccination coverage during the early phase of the national COVID-19 vaccine rollout. Computer Methods and Programs in Biomedicine, 221. https://doi.org/10.1016/j.cmpb.2022.106838
Rahmanti, A. R., Ningrum, D. N. A., Lazuardi, L., Yang, H. C., & Li, Y. C. (2021). Social Media Data Analytics for Outbreak Risk Communication: Public Attention on the “New Normal” During the COVID-19 Pandemic in Indonesia. Computer Methods and Programs in Biomedicine, 205. https://doi.org/10.1016/j.cmpb.2021.106083
Ren, Z., Shen, Q., Diao, X., & Xu, H. (2021). A sentiment-aware deep learning approach for personality detection from text. Information Processing and Management, 58(3), 102532. https://doi.org/10.1016/j.ipm.2021.102532
Saikia, S., Kumar, V., & Verma, M. K. (2023). Analyzing user sentiments toward selected content management software: a sentiment analysis of viewer’s comments on YouTube. Information Discovery and Delivery, 52(4), 378-387.https://doi.org/10.1108/IDD-01-2023-0009
Samreen, A., & Ali, S. A. (2024). Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu. Data in Brief, 52, 109906. https://doi.org/10.17632/d5j9fgbdcn.1
Subakti, A., Murfi, H., & Hariadi, N. (2022). The performance of BERT as data representation of text clustering. Journal of big Data, 9(1), 15. https://doi.org/10.1186/s40537-022-00564-9
Suhaimin, M. S. M., Hijazi, M. H. A., Moung, E. G., Nohuddin, P. N. E., Chua, S., & Coenen, F. (2023). Social media sentiment analysis and opinion mining in public security: Taxonomy, trend analysis, issues and future directions. Journal of King Saud University-Computer and Information Sciences, 35(9), 101776. https://doi.org/10.1016/j.jksuci.2023.101776
Wang, Z., Gao, P., & Chu, X. (2022). Sentiment analysis from Customer-generated online videos on product review using topic modeling and Multi-attention BLSTM. Advanced Engineering Informatics, 52, 101588. https://doi.org/10.1016/j.aei.2022.101588
Yao, Q., Li, R. Y. M., & Song, L. (2022). Construction safety knowledge sharing on YouTube from 2007 to 2021: Two-step flow theory and semantic analysis. Safety science, 153, 105796. https://doi.org/10.1016/j.ssci.2022.105796
Yulita, I. N., Wijaya, V., Rosadi, R., Sarathan, I., Djuyandi, Y., & Prabuwono, A. S. (2023). Analysis of government policy sentiment regarding vacation during the COVID-19 pandemic using the bidirectional encoder representation from transformers (BERT). Data, 8(3), 46. https://doi.org/10.3390/data8030046
Yunitasari, Y., Musdholifah, A., & Sari, A. K. (2019). Sarcasm Detection For Sentiment Analysis in Indonesian Tweets. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(1), 53-62. https://doi.org/10.22146/ijccs.41136
Zhang, T., Yang, K., Alhuzali, H., Liu, B., & Ananiadou, S. (2023). PHQ-aware depressive symptoms identification with similarity contrastive learning on social media. Information Processing & Management, 60(5), 103417. https://doi.org/10.1016/j.ipm.2023.103417
Zhao, G., Liu, Z., Chao, Y., & Qian, X. (2021). CAPER: Context-Aware Personalized Emoji Recommendation. IEEE Transactions on Knowledge and Data Engineering, 33(9), 3160–3172. https://doi.org/10.1109/TKDE.2020.2966971
Zou, H., & Xiang, K. (2022). Sentiment Classification Method Based on Blending of Emoticons and Short Texts. Entropy, 24(3), 19854–19863. https://doi.org/10.3390/e24030398