Concepedia

Publication | Closed Access

Improved Similarity Measures for Software Clustering

41

Citations

26

References

2011

Year

Abstract

Software clustering is a useful technique to recover architecture of a software system. The results of clustering depend upon choice of entities, features, similarity measures and clustering algorithms. Different similarity measures have been used for determining similarity between entities during the clustering process. In software architecture recovery domain the Jaccard and the Unbiased Ellenberg measures have shown better results than other measures for binary and non-binary features respectively. In this paper we analyze the Russell and Rao measure for binary features to show the conditions under which its performance is expected to be better than that of Jaccard. We also show how our proposed Jaccard-NM measure is suitable for software clustering and propose its counterpart for non-binary features. Experimental results indicate that our proposed Jaccard-NM measure and Russell & Rao measure perform better than Jaccard measure for binary features, while for non-binary features, the proposed Unbiased Ellenberg-NM measure produces results which are closer to the decomposition prepared by experts.

References

YearCitations

Page 1