Concepedia

Publication | Closed Access

RoadRunner: Towards Automatic Data Extraction from Large Web Sites

932

Citations

16

References

2001

Year

Abstract

The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach.

References

YearCitations

Page 1