Publication | Closed Access
An Improved DE Algorithm to Optimise the Learning Process of a BERT-based Plagiarism Detection Model
49
Citations
42
References
2022
Year
Improved De AlgorithmAutomatic Plagiarism DetectionEngineeringMachine LearningCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsAttention MechanismMachine TranslationLarge Ai ModelSequence ModellingComputer SciencePlagiarism DetectionDeep LearningRetrieval Augmented GenerationLearning ProcessContent Similarity DetectionText ProcessingLanguage Generation
Plagiarism detection is a challenging task, aiming to identify similar items in two documents. In this paper, we present a novel approach to automatic plagiarism detection that combines BERT (bidirectional encoder representations from transformers) word embedding, attention mechanism-based long short-term memory (LSTM) networks, and an improved differential evolution (DE) algorithm for weight initialisation. BERT is used to pretrain deep bidirectional representations in all layers, while the pre-trained BERT model can be fine-tuned with only one extra output layer without significant changes in architecture. Deep learning algorithms often use the random weighting method for initialisation, followed by gradient-based optimisation algorithms such as back-propagation for training, making them susceptible to getting trapped in local optima. To address this, population- based metaheuristic algorithms such as DE can be used. We propose an improved DE algorithm with a clustering-based mutation operator, where first a winning cluster of candidate solutions is identified and a new updating strategy is then applied to include new candidate solutions in the current population. The proposed DE algorithm is used in LSTM, attention mechanism, and feed- forward neural networks to yield the initial seeds for subsequent gradient-based optimisation. We compare our proposed model with conventional and population-based approaches on three datasets (SNLI, MSRP and SemEval2014) and demonstrate it to give superior plagiarism detection performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1