Sentiment Analysis of Emoji and Latinized Arabic in Indonesian Youtube Comments: A LABERT-LSTM Model

M. Noer Fadli Hidayat; Didik Dwi Prasetya; Triyanna Widiyaningtyas

doi:10.37385/jaets.v6i2.7000

Authors

M. Noer Fadli Hidayat Universitas Negeri Malang
Didik Dwi Prasetya Universitas Negeri Malang
Triyanna Widiyaningtyas Universitas Negeri Malang

DOI:

https://doi.org/10.37385/jaets.v6i2.7000

Keywords:

Analisis Sentimen, Latinized Arabic, Emoji, BERT, Bi-LSTM, LABERT-LSTM

Abstract

This study addresses the challenges of sentiment analysis on Indonesian-language YouTube comments, which are complex due to the use of dialects, slang words, emojis, and Latinized Arabic text. The proposed LABERT-LSTM model integrates BERT for deep feature extraction and Bi-LSTM to capture word sequence context effectively. The dataset comprises 24,593 YouTube comments from five renowned Islamic preachers discussing the topic of “tahlilan”. After data preprocessing, the model was evaluated using accuracy, precision, recall, and F1-score metrics. The results demonstrate that LABERT-LSTM achieved an accuracy of 0.95756, precision of 0.94014, recall of 0.91815, and an F1-score of 0.92868, outperforming standalone BERT and Bi-LSTM models by reducing misclassification and improving predictions for negative, positive, and neutral sentiment classes. Future research recommendations include expanding the dataset to other social media platforms, adopting advanced NLP techniques, conducting studies in other languages, and optimizing the model for enhanced performance and computational efficiency.

Downloads

Download data is not yet available.

References

Abid, F., Li, C., & Alam, M. (2020). Multi-source social media data sentiment analysis using bidirectional recurrent convolutional neural networks. Computer Communications, 157, 102–115. https://doi.org/10.1016/j.comcom.2020.04.002

Ahanin, Z., & Ismail, M. A. (2022). A multi-label emoji classification method using balanced pointwise mutual information-based feature selection. Computer Speech and Language, 73 , 101330. https://doi.org/10.1016/j.csl.2021.101330

Ainin, S., Feizollah, A., Anuar, N. B., & Abdullah, N. A. (2020). Sentiment analyses of multilingual tweets on halal tourism. Tourism Management Perspectives, 34, 100658.. https://doi.org/10.1016/j.tmp.2020.100658

Al-Azani, S., & El-Alfy, E. S. M. (2021). Early and Late Fusion of Emojis and Text to Enhance Opinion Mining. IEEE Access, 9, 121031–121045. https://doi.org/10.1109/ACCESS.2021.3108502

Ahda, F. A. I., Wibawa, A. P., Prasetya, D. D., & Sulistyo, D. A. (2024). Comparison of Adam Optimization and RMS prop in Minangkabau-Indonesian Bidirectional Translation with Neural Machine Translation. JOIV: International Journal on Informatics Visualization, 8(1), 231-238. http://dx.doi.org/10.62527/joiv.8.1.1818

Aribowo, A. S., Basiron, H., Yusof, N. F. A., & Khomsah, S. (2021). Cross-domain sentiment analysis model on indonesian youtube comment. International Journal of Advances in Intelligent Informatics, 7(1), 12–25. https://doi.org/10.26555/ijain.v7i1.554

Chakma, K., Ruba, U. B., & Riya, S. D. (2022). YouTube as an information source of floating agriculture: analysis of Bengali language contents quality and viewers’ interaction. Heliyon, 8(9). https://doi.org/10.1016/j.heliyon.2022.e10719

Chrismanto, A. R., Sari, A. K., & Suyanto, Y. (2023). Enhancing Spam Comment Detection on Social Media With Emoji Feature and Post-Comment Pairs Approach Using Ensemble Methods of Machine Learning. IEEE Access, 11, 80246–80265. https://doi.org/10.1109/ACCESS.2023.3299853

de Andrade, C. M., Belem, F. M., Cunha, W., França, C., Viegas, F., Rocha, L., & Gonçalves, M. A. (2023). On the class separability of contextual embeddings representations–or “the classifier does not matter when the (text) representation is so good!”. Information Processing & Management, 60(4), 103336. https://doi.org/10.1016/j.ipm.2023.103336

Elfajr, N. M., & Sarno, R. (2018, September). Sentiment analysis using weighted emoticons and SentiWordNet for Indonesian language. In 2018 International Seminar on Application for Technology of Information and Communication (pp. 234-238). IEEE.

Habbat, N., Nouri, H., Anoun, H., & Hassouni, L. (2023). Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning. Engineering Applications of Artificial Intelligence, 126, 106999. https://doi.org/10.1016/j.engappai.2023.106999

Haque, R., Islam, N., Tasneem, M., & Das, A. K. (2023). Multi-class sentiment classification on Bengali social media comments using machine learning. International Journal of Cognitive Computing in Engineering, 4, 21–35. https://doi.org/10.1016/j.ijcce.2023.01.001

Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

Jia, K., Meng, F., Liang, J., & Gong, P. (2023). Text sentiment analysis based on BERT-CBLBGA. Computers and Electrical Engineering, 112, 109019. https://doi.org/10.1016/j.compeleceng.2023.109019

Karmagatri, M., Utama, I. D., Yogie, H., Lemena, J., & Chen, H. H. (2023). Naive Bayes Sentiment Analysis on Perceptions of Halal Certification: A Case Study on Mixue Indonesia. Journal of Theoretical and Applied Information Technology, 101(17).

Karo, I. M. K., Fudzee, M. F. M., Kasim, S., & Ramli, A. A. (2022). Karonese sentiment analysis: a new dataset and preliminary result. JOIV: International Journal on Informatics Visualization, 6(2-2), 523-530. http://dx.doi.org/10.30630/joiv.6.2-2.1119

Kusumaningrum, R., Nisa, I. Z., Jayanto, R., Nawangsari, R. P., & Wibowo, A. (2023). Deep learning-based application for multilevel sentiment analysis of Indonesian hotel reviews. Heliyon, 9(6). https://doi.org/10.1016/j.heliyon.2023.e17147

Li, D., Rzepka, R., Ptaszynski, M., & Araki, K. (2020). HEMOS: A novel deep learning-based fine-grained humor detecting method for sentiment analysis of social media. Information Processing and Management, 57(6). https://doi.org/10.1016/j.ipm.2020.102290

Li, Z., & Zou, Z. (2024). Punctuation and lexicon aid representation: A hybrid model for short text sentiment analysis on social media platform. Journal of King Saud University - Computer and Information Sciences, 36(3). https://doi.org/10.1016/j.jksuci.2024.102010

Monesa, F., & Jayadi, R. (2023). Sentiment Analysis On Investment Topic In Indonesia Using Machine Learning Algorithms Approach. Journal of Theoretical and Applied Information Technology, 101(13).

Murfi, H., Theresia Gowandi, S., Ardaneswari, G., & Nurrohmah, S. (2024). BERT-based combination of convolutional and recurrent neural network for indonesian sentiment analysis. Applied Soft Computing, 151. https://doi.org/10.1016/j.asoc.2023.111112

Naseem, U., Razzak, I., Musial, K., & Imran, M. (2020). Transformer based Deep Intelligent Contextual Embedding for Twitter sentiment analysis. Future Generation Computer Systems, 113, 58–69. https://doi.org/10.1016/j.future.2020.06.050

Peng, D., & Zhao, H. (2021). Seq2Emoji: A hybrid sequence generation model for short text emoji prediction. Knowledge-Based Systems, 214. https://doi.org/10.1016/j.knosys.2020.106727

Peng, S., Cao, L., Zhou, Y., Ouyang, Z., Yang, A., Li, X., ... & Yu, S. (2022). A survey on deep learning for textual emotion analysis in social networks. Digital Communications and Networks, 8(5), 745-762. https://doi.org/10.1016/j.dcan.2021.10.003

Rahmanti, A. R., Chien, C. H., Nursetyo, A. A., Husnayain, A., Wiratama, B. S., Fuad, A., Yang, H. C., & Li, Y. C. J. (2022). Social media sentiment analysis to monitor the performance of vaccination coverage during the early phase of the national COVID-19 vaccine rollout. Computer Methods and Programs in Biomedicine, 221. https://doi.org/10.1016/j.cmpb.2022.106838

Rahmanti, A. R., Ningrum, D. N. A., Lazuardi, L., Yang, H. C., & Li, Y. C. (2021). Social Media Data Analytics for Outbreak Risk Communication: Public Attention on the “New Normal” During the COVID-19 Pandemic in Indonesia. Computer Methods and Programs in Biomedicine, 205. https://doi.org/10.1016/j.cmpb.2021.106083

Ren, Z., Shen, Q., Diao, X., & Xu, H. (2021). A sentiment-aware deep learning approach for personality detection from text. Information Processing and Management, 58(3), 102532. https://doi.org/10.1016/j.ipm.2021.102532

Saikia, S., Kumar, V., & Verma, M. K. (2023). Analyzing user sentiments toward selected content management software: a sentiment analysis of viewer’s comments on YouTube. Information Discovery and Delivery, 52(4), 378-387.https://doi.org/10.1108/IDD-01-2023-0009

Samreen, A., & Ali, S. A. (2024). Dataset construction to detect human behavior with the help of emotions, sentiments and mood for Roman Urdu. Data in Brief, 52, 109906. https://doi.org/10.17632/d5j9fgbdcn.1

Subakti, A., Murfi, H., & Hariadi, N. (2022). The performance of BERT as data representation of text clustering. Journal of big Data, 9(1), 15. https://doi.org/10.1186/s40537-022-00564-9

Suhaimin, M. S. M., Hijazi, M. H. A., Moung, E. G., Nohuddin, P. N. E., Chua, S., & Coenen, F. (2023). Social media sentiment analysis and opinion mining in public security: Taxonomy, trend analysis, issues and future directions. Journal of King Saud University-Computer and Information Sciences, 35(9), 101776. https://doi.org/10.1016/j.jksuci.2023.101776

Wang, Z., Gao, P., & Chu, X. (2022). Sentiment analysis from Customer-generated online videos on product review using topic modeling and Multi-attention BLSTM. Advanced Engineering Informatics, 52, 101588. https://doi.org/10.1016/j.aei.2022.101588

Yao, Q., Li, R. Y. M., & Song, L. (2022). Construction safety knowledge sharing on YouTube from 2007 to 2021: Two-step flow theory and semantic analysis. Safety science, 153, 105796. https://doi.org/10.1016/j.ssci.2022.105796

Yulita, I. N., Wijaya, V., Rosadi, R., Sarathan, I., Djuyandi, Y., & Prabuwono, A. S. (2023). Analysis of government policy sentiment regarding vacation during the COVID-19 pandemic using the bidirectional encoder representation from transformers (BERT). Data, 8(3), 46. https://doi.org/10.3390/data8030046

Yunitasari, Y., Musdholifah, A., & Sari, A. K. (2019). Sarcasm Detection For Sentiment Analysis in Indonesian Tweets. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(1), 53-62. https://doi.org/10.22146/ijccs.41136

Zhang, T., Yang, K., Alhuzali, H., Liu, B., & Ananiadou, S. (2023). PHQ-aware depressive symptoms identification with similarity contrastive learning on social media. Information Processing & Management, 60(5), 103417. https://doi.org/10.1016/j.ipm.2023.103417

Zhao, G., Liu, Z., Chao, Y., & Qian, X. (2021). CAPER: Context-Aware Personalized Emoji Recommendation. IEEE Transactions on Knowledge and Data Engineering, 33(9), 3160–3172. https://doi.org/10.1109/TKDE.2020.2966971

Zou, H., & Xiang, K. (2022). Sentiment Classification Method Based on Blending of Emoticons and Short Texts. Entropy, 24(3), 19854–19863. https://doi.org/10.3390/e24030398

Sentiment Analysis of Emoji and Latinized Arabic in Indonesian Youtube Comments: A LABERT-LSTM Model

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

Current Issue

Information

Developed By