Publication | Closed Access
Web-scale Data Integration: You can only afford to Pay As You Go
301
Citations
25
References
2007
Year
Unknown Venue
The Web is experiencing a surge in heterogeneous structured data from sources such as the Deep Web, Flickr, and Google Base, creating opportunities but also new challenges for large‑scale data management. The paper aims to identify and analyze the challenges of web‑scale data integration in the Deep Web and Google Base contexts. The authors introduce PAYGO, a pay‑as‑you‑go data integration architecture inspired by dataspaces, designed to handle web‑scale heterogeneity. The authors argue that conventional data integration methods are inadequate for the heterogeneity and scale of modern web data.
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like Flickr, and sites like Google Base. While this phenomenon is creating an opportunity for structured data management, dealing with heterogeneity on the web-scale presents many new challenges. In this paper, we highlight these challenges in two scenarios – the Deep Web and Google Base. We contend that traditional data integration techniques are no longer valid in the face of such heterogeneity and scale. We propose a new data integration architecture, PAYGO, which is inspired by the concept of dataspaces and emphasizes pay-as-you-go data management as means for achieving web-scale data integration.
| Year | Citations | |
|---|---|---|
Page 1
Page 1