Manuscript Title:

EXTREMISM CLASSIFICATION BASED ON TWITTER TEXT USING ENSEMBLE LEARNING

Author:

MUHAMMAD SABIR, DOST MUHAMMAD KHAN, FAISAL SHAHZAD, MUHAMMAD NAUMAN, ASAD ALI

DOI Number:

DOI:10.17605/OSF.IO/7FW5Y

Published : 2022-09-23

About the author(s)

1. MUHAMMAD SABIR - Department of Information Technology, Islamia University of Bahawalpur, Pakistan.
2. DOST MUHAMMAD KHAN - Department of Information Technology, Islamia University of Bahawalpur, Pakistan.
3. FAISAL SHAHZAD - Department of Information Technology, Islamia University of Bahawalpur, Pakistan.
4. MUHAMMAD NAUMAN - Department of Artificial Intelligence, Islamia University of Bahawalpur, Pakistan.
5. ASAD ALI - Department of Computer Science and Information Technology, National College of Business Administration and Economic Bahawalpur, Pakistan.

Full Text : PDF

Abstract

The internet, especially social media networks, has changed the way that the criminal and militant groups influence and accelerate individuals. According to a recent report, the way these groups operate begins with exposing a large online audience to extreme content and moving to a more open online platform for further interest. Therefore, it is important to identify online extreme content to limit its spread and distribution. The purpose of this research is to categorize ways to detect extreme content on social media automatically. Identification of numerous signals included in the text, psychology, and behavior, which allow an extremist categorization from Twitter text messages particularly. The proposed approach is based on machine learning methods that help to achieve the goals to determine the extreme contents in the textual data. The dataset used in this research was extracted from twitter based on threatening, terrorism, and cyber-bullying that are classified as extreme content to analyze the psychological behavior of the people and to prevent the crime. The proposed research perform analysis based on machine learning models including naive bayes, decision tree, random forest, support vector classifier, logistic regression, and ensemble model based on logistic regression, support vector classifier, and decision tree that presents a state of art classification system using feature fusion and hyper parameter optimization techniques such as Grid search and A Fast and Lightweight Auto ML Library (FLMAL) techniques. The major contribution of the study is to build Extremism Classification based on Ensemble Optimized Feature Fusion (ECEOFF) system using K-fold cross validation. The evaluation parameters such as accuracy, recall, precision, and f1 score are used. The proposed approach gain 99.8% accuracy, 99% precision, 97.5% recall, and 99% f1 score results that outperforms the other machine learning model used in comparative analyses. Further, the comparative analysis with three different feature techniques and two hyper parameter tuning technique is carried out to evaluate the proposed research in scientific manners.


Keywords

Extremism, Social profiling, Classification, Supervised Learning, Machine Learning.