Concepedia

Publication | Closed Access

GRETA: Graph-Based Tag Assignment for GitHub Repositories

27

Citations

28

References

2016

Year

Abstract

GitHub is a well-known software community where a large number of software repositories are hosted. Since large amounts of documents and code in GitHub repositories are in a mess, users cannot search or understand them efficiently. One solution is to employ a tag system, which annotates each repository with several tags. Thus, the GitHub repositories can be more efficiently accessed and understood. However, GitHub does not provide any automated tools of tagging repositories. In order to tackle this problem, we propose GRETA, a novel graph-based approach to assigning tags for repositories on GitHub. The core insight of GRETA is (1) to construct an Entity-Tag Graph (ETG) for GitHub using the domain knowledge from StackOverflow, and (2) to assign tags for repositories by taking a random walk algorithm. We have implemented GRETA and also developed a repository search engine for GitHub using the tag assignment results of GRETA. We have evaluated GRETA against several baseline methods to investigate its effectiveness of tagging GitHub repositories. The results show GRETA achieves up to 35% of F-Measure, outperforming the baseline methods. Besides, the GRETA-based search engine gains a higher NDCG value than the search engine provided by GitHub, indicating that it significantly enhances the search ability on GitHub with the tagged repositories.

References

YearCitations

Page 1