Publication | Open Access
Web Ecology: Recycling HTML pages as XML documents using W4F
47
Citations
8
References
1999
Year
Unknown Venue
In this paper we present the World-Wide Web Wrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources. Some key features of W4F are an expressive language to extract information from HTML pages in a structured way, a mapping to export it as XML documents and some visual tools to assist the user during wrapper creation. Moreover, the entire description of wrappers is fully declarative. As an illustration, we demonstrate how to use W4F to create XML gateways, that serve transparently and on-the-fly HTML pages as XML documents with their DTDs. 1 Introduction The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. But Web data sources also consist of stand-alone HTML pages hand-coded by individuals, that provide very useful information such as reviews, digests, links, etc. As soon as we want to go beyond the basic m...
| Year | Citations | |
|---|---|---|
Page 1
Page 1