Concepedia

Publication | Closed Access

PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce

34

Citations

18

References

2013

Year

Abstract

Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks such as Twitter. PSCAN is based on the structural clustering algorithm of SCAN, which not only finds cluster accurately, but also identifies vertices playing special roles such as hubs and outliers. An empirical evaluation of PSCAN using both real and synthetic networks demonstrated an outstanding performance in terms of accuracy and running time. We analyzed a Twitter network with over 40 million users and 1.4 billion follower/following relationships by using PSCAN on a Hadoop cluster with 15 computers. The result shows that PSCAN successfully detected interesting communities of people who share common interests.

References

YearCitations

Page 1