Manuscript Title:

SENTIMENT ANALYSIS OF BANGLA-ENGLISH CODE-MIXED AND TRANSLITERATED SOCIAL MEDIA COMMENTS USING MACHINE LEARNING

Author:

FAHIMA HOSSAIN, NUSRAT JAHAN, JOYREYA ABADIN

DOI Number:

DOI:10.5281/zenodo.10152990

Published : 2023-11-10

About the author(s)

1. FAHIMA HOSSAIN - Lecturer, Department of Computer Science & Engineering, Hamdard University Bangladesh, Bangladesh.
2. NUSRAT JAHAN - BSc in CSE, Hamdard University Bangladesh, Bangladesh.
3. JOYREYA ABADIN - BSc in CSE, Hamdard University Bangladesh, Bangladesh.

Full Text : PDF

Abstract

In this era of technology, online communication has expanded to incorporate different language combinations and expressive styles. Code-mixing, or combining languages such as Bengali and English, is frequent in multilingual environments. In addition, the use of emojis, which are small graphical icons, has increased the complexity of online communication by allowing for more emotional expression. Sentiment analysis of comments on social media that are combined with emojis and written in Bangla-English language is the main topic of this work. This research aims to understand and classify the sentiments expressed in code-mixed comments, including the nuanced role of emojis. The data was preprocessed first, then feature extraction was performed using the TF-IDF Vectorizer and CountVectorizer algorithms. For the analysis, nine different machine learning algorithms were used. With a remarkable accuracy of 85.7% and an F1 score of 85.0%, the Support Vector Classifier stood out as the most successful model. This highlights the utility of including emoji-based features for complex sentiment analysis, particularly when dealing with code-mixed data. The dataset contained 2055 comments from Facebook pages, including Bangla-English comments with and without emojis and comments that only contained emojis. To prepare the dataset for analysis, preprocessing methods included removing irrelevant data and converting emojis into Unicode short names.


Keywords

Sentiment Analysis, Emoji Analysis, Natural Language Processing, Machine Learning, Code- mixing, Emoji, Unicode.