Publication | Closed Access
Disguised plagiarism detection in Arabic text documents
14
Citations
18
References
2018
Year
Unknown Venue
EngineeringCross-lingual RepresentationInformation ForensicsCorpus LinguisticsText MiningWord EmbeddingsNatural Language ProcessingLanguage DocumentationInformation RetrievalArabicData MiningComputational LinguisticsDocument ClassificationLanguage StudiesMachine TranslationPlagiarism DetectionContent Similarity DetectionPlagiarism Detection TaskText ProcessingDecision TreesLinguisticsDocument ProcessingSemantic Similarity
Plagiarism detection is a challenging Natural Language Processing (NLP) task. Recently, many systems have been able to detect the simple verbatim reproduction (copy and paste). However, more disguised plagiarism techniques have been used in real plagiarism cases such as: rewording, synonym substitution, paraphrasing and text manipulation, which make the plagiarism detection task much more difficult. In this paper, we propose two approaches devoted to assist users in detecting plagiarism in Arabic natural language texts. The first approach is based on word-embedding, words alignment, and words weighting for the purpose of measuring the semantic similarity relationships among textual units. The second approach is based on Machine Learning (ML), where the characterisation is performed at the sentence level. We combine lexical, syntactic, and semantic features to assist the detection task. The Support Vector Machine (SVM), Decision Trees (DT), and Random Forests (RF) are investigated. The classifiers are trained and evaluated using the training dataset of the first Arabic Plagiarism Detection (AraPlagDet) shared task 2015. Our experimental results show that the proposed approaches achieve promising results compared to state-of-the-art Arabic plagiarism detection systems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1