Publication | Open Access
DataHub: Collaborative Data Science & Dataset Version Management at Scale
112
Citations
14
References
2014
Year
EngineeringData RepositoryData CurationSoftware EngineeringSemantic WebVersion Control SystemCollaborative Data ManagementDatabase SystemData ScienceData MiningDatabase SupportManagementData IntegrationCollaborative Data ScienceBig DataData ManagementData CollaborationResearch ManagementKnowledge DiscoveryDistributed Data ManagementComputer ScienceResearch Data ManagementDatabase TechnologyCollaborative Data AnalysisDataset Version ControlData Modeling
Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.
| Year | Citations | |
|---|---|---|
Page 1
Page 1