AUTOMATIC PLAGIARISM DETECTION AND EXTRACTION IN A  MULTILINGUAL: A CRITICAL STUDY AND COMPARISON

NEHA N. CHAUBEY, NIRBHAY KUMAR CHAUBEY

Manuscript Title:

AUTOMATIC PLAGIARISM DETECTION AND EXTRACTION IN A MULTILINGUAL: A CRITICAL STUDY AND COMPARISON

Author:

NEHA N. CHAUBEY, NIRBHAY KUMAR CHAUBEY

DOI Number:

DOI:10.17605/OSF.IO/DWUK4

Published : 2022-01-27

About the author(s)

1. NEHA N. CHAUBEY - Student, Electronics and Communication Department, Dharmsinh Desai University, Gujarat, India.
2. NIRBHAY KUMAR CHAUBEY - Dean, Department of Computer Science, Ganpat University, Gujarat, India.

Full Text : PDF

Abstract

The effectiveness of plagiarism detection is challenging because of the large quantity of accessible words of multiple languages on the internet. Plagiarism arises in various levels of complication extending from the original resource data to the concise text. Detection of plagiarism contents from one language to multilingual is one of the prime concerns. In the previous studies, extensive research works are presented to detect plagiarism contents to monolinguals. Although, less reliable research work takes place to detect multilingual plagiarism wherein one writer’s work contents plagiarized by another writer. This is a major challenge for the researchers, academic institutes and research organizations and conference organizers to check the authenticity of the work and it has been gaining more focus in the research area in recent years. This paper extensively reviews the state-of-the-art various plagiarism detection and extraction techniques for the monolingual, bilingual and multiple languages and comprises the discussion. Moreover, benefits and limitations of the various deep learning based multiple language plagiarism detection techniques with the supported languages are reviewed. Consequently, this paper highlights some better techniques for plagiarism detection depending on machine learning techniques and deep learning based solutions.

Keywords

Plagiarism, Cross-language plagiarism detection, Cross-language dataset, Natural Language Processing (NLP) Techniques.