Publication | Closed Access
PSCAN: A Parallel Structural Clustering Algorithm for Big Networks in MapReduce
34
Citations
18
References
2013
Year
Unknown Venue
Cluster ComputingEngineeringBig NetworksCommunity MiningNetwork AnalysisMap-reduceComputational Social ScienceData ScienceData MiningCommunity DetectionSocial Network AnalysisKnowledge DiscoveryComputer ScienceTwitter NetworkSocial Network AggregationCommunity StructureNetwork ScienceGraph TheoryBusinessParallel ProgrammingLarge-scale NetworkMassive Data ProcessingBig Data
Big data such as complex networks with over millions of vertices and edges is infeasible to process using conventional computation. MapReduce is a programming model that empowers us to analyze big data in a cluster of computers. In this paper we propose a Parallel Structural Clustering Algorithm for big Networks (PSCAN) in MapReduce for the detection of clusters or community structures in big networks such as Twitter. PSCAN is based on the structural clustering algorithm of SCAN, which not only finds cluster accurately, but also identifies vertices playing special roles such as hubs and outliers. An empirical evaluation of PSCAN using both real and synthetic networks demonstrated an outstanding performance in terms of accuracy and running time. We analyzed a Twitter network with over 40 million users and 1.4 billion follower/following relationships by using PSCAN on a Hadoop cluster with 15 computers. The result shows that PSCAN successfully detected interesting communities of people who share common interests.
| Year | Citations | |
|---|---|---|
Page 1
Page 1