Publication | Closed Access
A Fast Clustering Algorithm for Modularization of Large-Scale Software Systems
44
Citations
35
References
2020
Year
Software MaintenanceCluster ComputingEngineeringAlgorithmic LibraryHierarchical AlgorithmsSoftware EngineeringSoftware AnalysisCluster TechnologyParallel SoftwareData ScienceData MiningModule DesignSystems EngineeringParallel ComputingSearch-based Software EngineeringDocument ClusteringKnowledge DiscoveryComputer EngineeringUltra-large SystemApplication AnalysisComputer ScienceSoftware DesignAcceptable ClusteringFast Clustering AlgorithmProgram AnalysisParallel ProgrammingSystem Software
A software system evolves over time in order to meet the needs of users. Understanding a program is the most important step to apply new requirements. Clustering techniques through dividing a program into small and meaningful parts make it possible to understand the program. In general, clustering algorithms are classified into two categories: hierarchical and non-hierarchical algorithms (such as search-based approaches). While clustering problems generally tend to be NP-hard, search-based algorithms produce acceptable clustering and have time and space constraints and hence they are inefficient in large-scale software systems. Most algorithms which currently used in software clustering fields do not scale well when applied to large and very large applications. In this paper, we present a new and fast clustering algorithm, FCA, that can overcome space and time constraints of existing algorithms by performing operations on the dependency matrix and extracting other matrices based on a set of features. The experimental results on ten small-sized applications, ten folders with different functionalities from Mozilla Firefox, a large-sized application (namely ITK), and a very large-sized application (namely Chromium) demonstrate that the proposed algorithm achieves higher quality modularization compared with hierarchical algorithms. It can also compete with search-based algorithms and a clustering algorithm based on subsystem patterns. But the running time of the proposed algorithm is much shorter than that of the hierarchical and non-hierarchical algorithms. The source code of the proposed algorithm can be accessed at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/SoftwareMaintenanceLab</uri> .
| Year | Citations | |
|---|---|---|
Page 1
Page 1