Publication | Closed Access
Auto-transform
23
Citations
42
References
2020
Year
EngineeringStructured DataTbp TransformationsNatural Language ProcessingData ScienceMicrosoft ExcelComputational LinguisticsManagementData IntegrationData ManagementMachine TranslationKnowledge DiscoveryMachine-readable RepresentationComputer ScienceData ManipulationData EngineeringData TransformationData Transformation (Computing)Data Modeling
Data Transformation is a long-standing problem in data management. Recent work adopts a "transform-by-example" (TBE) paradigm to infer transformation programs based on user-provided input/output examples, which greatly improves usability, and brought such features into mainstream software like Microsoft Excel, Power BI, and Trifacta. While TBE is great progress, the need for users to provide paired input/output examples still poses limits on its applicability. In this work, we study an alternative that transforms data based on input/output data patterns only (without paired examples). We term this new paradigm transform-by-patterns (TBP). Specifically, we demonstrate that there is a rich class of transformations in TBP that can be "learned" from large collections of paired table columns. We show the proposed method can harvest such transformations across diverse domains and corpora (e.g., in different languages such as English, Chinese, Spanish, etc.). TBP transformations so obtained can be used in scenarios such as suggesting data-repairs in tables, or automating transformations in ETL pipelines. Extensive experiments on real data suggest that TBP outperforms existing methods on tasks such as data repairs, and is a promising direction for future research.
| Year | Citations | |
|---|---|---|
Page 1
Page 1