An iterative MapReduce approach to frequent subgraph mining in biological datasets

Abstract

Mining frequent subgraphs has attracted a great deal of attention in many areas, such as bioinformatics, web data mining and social networks. There are many promising main memory-based techniques available in this area, but they lack scalability as the main memory is a bottleneck. Taking the massive data into consideration, traditional database systems like relational databases and object databases fail miserably with respect to efficiency as frequent subgraph mining is computationally intensive. With the advent of the MapReduce framework by Google, a few researchers have applied the MapReduce model on a single graph for mining frequent substructures. In this paper, we propose to make use of the MapReduce programming model which achieves multifold scalability on a set of labeled graphs. We tested our method on both real and synthetic datasets. To the best of our knowledge, this is the first attempt to implement transaction graphs using the MapReduce model.

References

Page 1

	Year	Citations

Page 1