Publication | Open Access
Croissant: A Metadata Format for ML-Ready Datasets
15
Citations
7
References
2024
Year
Unknown Venue
EngineeringMachine LearningPopular Ml FrameworksMachine Learning ToolMetadataSemantic WebLarge-scale DatasetsData ScienceData MiningManagementDistributed Machine LearningData IntegrationData Pre-processingData ManagementData ModelingKnowledge DiscoveryData-centric AiMeta DataMetadata FormatMetadata SchemaAutomated Machine LearningData TreatmentBig Data
Data is a critical resource for Machine Learning (ML), yet working with data remains a key friction point. This paper introduces Croissant, a metadata format for datasets that simplifies how data is used by ML tools and frameworks. Croissant makes datasets more discoverable, portable and interoperable, thereby addressing significant challenges in ML data management and responsible AI. Croissant is already supported by several popular dataset repositories, spanning hundreds of thousands of datasets, ready to be loaded into the most popular ML frameworks.
| Year | Citations | |
|---|---|---|
Page 1
Page 1