Concepedia

Publication | Closed Access

A Comparison of Web Data Extraction Techniques

13

Citations

16

References

2019

Year

Abstract

Extracting a structured text data from a published webpages has drawn attention in the last decade, the process of web data extraction has many challenges, due to variety of web data and the unstructured from of HTML files. The aim of this survey is to provide a comprehensive overview of current web data extraction techniques, in term of extracted data quality, where the redundant and the noise data should be eliminated. Merits and demerits for each web information extraction technique will be stated, and finally a classification framework for the discussed techniques will be provided.

References

YearCitations

Page 1