Publication | Closed Access
Adressing problems with external validity of repository mining studies through a smart data platform
37
Citations
24
References
2016
Year
Unknown Venue
Software MaintenanceEngineeringRepository Mining StudiesData RepositorySoftware EngineeringData PublishingSemantic WebNovel Data SourceSoftware AnalysisEmpirical Software Engineering ResearchInformation RetrievalData ScienceData MiningDatabase SupportSmart Data PlatformSoftware Repository MiningData IntegrationInformation DiscoveryKnowledge Discovery ProcessData ManagementStatisticsSoftware MiningKnowledge DiscoveryResearch Data ArchivingDatabase TechnologyCase ProblemsSoftware DesignExternal ValidityPrototype Smartshark
Research in software repository mining has grown considerably the last decade. Due to the data-driven nature of this venue of investigation, we identified several problems within the current state-of-the-art that pose a threat to the external validity of results. The heavy re-use of data sets in many studies may invalidate the results in case problems with the data itself are identified. Moreover, for many studies data and/or the implementations are not available, which hinders a replication of the results and, thereby, decreases the comparability between studies. Even if all information about the studies is available, the diversity of the used tooling can make their replication even then very hard. Within this paper, we discuss a potential solution to these problems through a cloud-based platform that integrates data collection and analytics. We created the prototype SmartSHARK that implements our approach. Using SmartSHARK, we collected data from several projects and created different analytic examples. Within this article, we present SmartSHARK and discuss our experiences regarding the use of SmartSHARK and the mentioned problems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1