Publication | Closed Access
MatchDetectReveal: Finding Overlapping and Similar Digital Documents
34
Citations
9
References
1999
Year
EngineeringInformation ForensicsCorpus LinguisticsText MiningImage AnalysisInformation RetrievalData ScienceData MiningPattern RecognitionString-searching AlgorithmString ProcessingDocument ClusteringKnowledge DiscoveryComputer ScienceFinding OverlappingDigital DocumentsContent Similarity DetectionSuffix Tree RepresentationCombinatorial Pattern MatchingSemi-structured Digital DocumentsSimilarity SearchDocument Processing
The Internet provides easy access to large collections of semi-structured digital documents. WWW browsers, search engines and the cut & paste technique are tempting to substitute one's creativity by simple compilation from appropriate digital resources. This paper discusses the problems of detecting plagiarism in large collections of semi-structured electronic texts. Overlaps in and similarity of digital documents and software code are in the focus of this project. The conceptual architecture of the MatchDetectReveal system is presented along with possible applications. The main component of the system is using the string matching algorithms and a suffix tree representation. Both sequential and parallel cluster-based processing issues are addressed. The implementation and performance issues are also discussed.
| Year | Citations | |
|---|---|---|
Page 1
Page 1