Concepedia

Publication | Closed Access

Wrapping web data into XML

56

Citations

7

References

2001

Year

Abstract

The vast majority of online information is part of the World Wide Web. In order to use this information for more than human browsing, web pages in HTML must be converted into a format meaningful to software programs. Wrappers have been a useful technique to convert HTML documents into semantically meaningful XML files. However, developing wrappers is slow and labor-intensive. Further, frequent changes on the HTML documents typically require frequent changes in the wrappers. This paper describes XWRAP Elite, a tool to automatically generate robust wrappers. XWRAP breaks down the conversion process into three steps. First, discover where the data is located in an HTML page and separating the data into individual objects. Second, decompose objects into data elements. Third, mark objects and elements in an output format. XWRAP Elite automates the first two steps and minimizes human involvement in marking output data. Our experience shows that XWRAP is able to create useful wrapper software for a wide variety of real world HTML documents.

References

YearCitations

Page 1