Manuscript Title:

TOPIC MODELING USING DOCUMENT PIVOT APPROACH WITH FIXED WINDOW

Author:

MUSHTAQ AHMED, RAFAQAT KAZMI, NADEEM SARWAR, MUHAMMAD MURAD KHAN, ALI SAMAD TAUNI, SUNNIA IKRAM, AMNA IKRAM

DOI Number:

DOI:10.17605/OSF.IO/S6N58

Published : 2023-02-10

About the author(s)

1. MUSHTAQ AHMED - Department of Computer Science, the Islamia University of Bahawalpur Pakistan.
2. RAFAQAT KAZMI - Department of Software Engineering, the Islamia University of Bahawalpur, Pakistan.
3. NADEEM SARWAR - Department of Computer Science, Bahira University Lahore Campus Pakistan.
4. MUHAMMAD MURAD KHAN - Government College University Faisalabad.
5. ALI SAMAD TAUNI - Department of Data Science. The Islamia University of Bahawalpur.
6. SUNNIA IKRAM - Department of Software Engineering, the Islamia University of Bahawalpur.
7. AMNA IKRAM - Department of Computer Science & IT, Government Sadiq College Women University, Bahawalpur, Pakistan.

Full Text : PDF

Abstract

Twitter is very popular social media and micro blogging platform. The tweet of twitter are limited to 140 character and people are using this platform for getting information and keep themselves up to date in term of latest information and events. Topic modeling means to find the Topic Headings from the textual dataset. This research performed topic modeling using the document pivot approach on Twitter data, as this method was not suitable for generating the topic from Twitter stream data. This research implemented the document pivot using a Fixed Window approach by using Term Frequency-Inverse Document Frequency (TF-IDF) method in such a way on the twitter stream data that it will generate results in terms of topics high accuracy volume and solve the issue of sort of noise, rapidly changing contents and a very short size of the document. This research also conducted a cross-verification using the probabilistic topic modeling method's algorithm that is known as Latent Dirichlet Allocation (LDA). The result of both the Latent Dirichlet Allocation approach and the Term Frequency-Inverse Document Frequency algorithm compared. This comparative analysis shows that Topic Modeling using Document Pivot Approach with Fixed Window Operation shows more accuracy results and perform well processing on short text micro blogging data, which was limitation and gaps in previous researches. This accuracy results of this study are higher than currently using algorithm of Latent Dirichlet Allocation.


Keywords

Topic Modelling, Document Pivot Approach, Fixed Window, LDA (Latent Dirichlet Allocation), TF-IDF (Term Frequency-Inverse Document Frequency).