Manuscript Title:

A STUDY TO DETERMINE THE RESILIENCE OF THE OPTIMAL ALGORITHM UNDER VARIOUS CONDITIONS

Author:

JIWEI YAN, Dr. MIDHUN CHAKKARAVARTHY, Dr. SANDEEP SHRESTHA

DOI Number:

DOI:10.17605/OSF.IO/W3XS2

Published : 2022-11-10

About the author(s)

1. JIWEI YAN - Research Scholar of Lincoln University College Malaysia.
2. Dr. MIDHUN CHAKKARAVARTHY - Associate Professor, Dean, Faculty of Computer Science and Multimedia, Lincoln University College.
3. Dr. SANDEEP SHRESTHA - Professor of Lincoln University College Malaysia.

Full Text : PDF

Abstract

E-commerce forecasting relies on categorising whether or not a customer's visit to an online store results in a purchase. When a big German clothing shop categorises its clients and displays gift cards in front of those who aren't buying anything to nudge them into making a purchase, its increased turnover. To make such forecasts, a wide range of prediction models and data sources are available. For the purpose of determining how consumers may best be categorised as purchasing or not buying, this study seeks to retrieve well-suited prediction models and compare their performances across diverse data kinds, like static and dynamic data Research conducted using the Cross Industry Standard Process for Data Mining (CRISPDM) formed the basis of this paper. There were several suitable models identified through literature study, including boosted trees such as the RF and RNN, as well as Support Vector Machines such as the SVM and the FNN, as well as Logistic Regression and Recurrent Neural Networks. Algorithms were then trained on three separate datasets, the sequential session data, the static customer data, and a combined dataset, before being assessed and contrasted using several performance criteria, including prediction latency and comprehensibility.. After that, the RNN was trained on datasets varied in the amount of feature engineering that was necessary. Python was used to create all algorithms, evaluations, and comparisons. According to the acquired data, the RF was the most effective while exhibiting a tolerable prediction delay. There was no difference between the algorithms in terms of comprehensibility. Customer information has a little impact on the performance of a combined dataset, which yields the best outcomes. ROC AUC values for various datasets and methods are included in a table for comparison's sake. The RNN also showed a promising effect in terms of time-consuming feature engineering, where fewer and less designed features produced better results than a greater number of more extensively engineered features utilised in the other algorithms.


Keywords

RF, evaluations, comparisons, Logistic Regression, Recurrent Neural Networks.