Publication | Closed Access
A MapReduce Framework for Mining Maximal Contiguous Frequent Patterns in Large DNA Sequence Datasets
18
Citations
10
References
2012
Year
EngineeringGeneticsMapreduce FrameworkPattern DiscoveryPattern MiningDna SequencesGenomicsMap-reduceSequence MotifPhylogeneticsData ScienceData MiningMolecular EcologyData IntegrationData ManagementHadoop PlatformKnowledge DiscoveryComputer ScienceBioinformaticsFunctional GenomicsFrequent Pattern MiningComputational BiologyStructure MiningMedicineBig Data
AbstractCurrent DNA sequence datasets have become extremely large, making it a great challenge for single-processor and main-memory-based computing systems to mine interesting patterns. Such limited hardware resources make the performance of most Apriori-like algorithms inefficient. However, recent implementation of a MapReduce framework has overcome these limitations. Furthermore, mining with maximal contiguous frequent patterns to express the function and structure of DNA sequences is a useful technique, capable of capturing the common data characteristics among related sequences. In this paper, we proposed an efficient approach for mining maximal contiguous frequent patterns in large DNA sequence data using MapReduce framework which can handle a massive DNA sequence datasets with a large number of nodes on a Hadoop platform. Our extensive experimental results show that the proposed approach can mine the complete set of maximal contiguous frequent patterns very efficiently.
| Year | Citations | |
|---|---|---|
Page 1
Page 1