Publication | Closed Access
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
932
Citations
16
References
2001
Year
Unknown Venue
The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and differences. Experimental results on real-life data-intensive Web sites confirm the feasibility of the approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1