Publication | Open Access
The Lixto data extraction project
135
Citations
26
References
2004
Year
Unknown Venue
EngineeringKnowledge ExtractionLixto ProjectSemantic WebInformation RetrievalData ScienceData MiningDatabase SystemManagementData IntegrationSemi-structured DataData ManagementKnowledge DiscoveryComputer ScienceDatabase TechnologyInformation ExtractionDatabase TheorySoftware DesignLixto Transformation ServerAutomated ReasoningFormal MethodsData ExtractionData ModelingSemantic Interoperability
We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definition software.We discuss the project's main motivations and ideas, in particular the use of a logic-based framework for wrapping.Then we present theoretical results on monadic datalog over trees and on Elog, its close relative which is used as the internal wrapper language in the Lixto system.These results include both a characterization of the expressive power and the complexity of these languages.We describe the visual wrapper specification process in Lixto and various practical aspects of wrapping.We discuss work on the complexity of query languages for trees that was inseminated by our theoretical study of logic-based languages for wrapping.Then we return to the practice of wrapping and the Lixto Transformation Server, which allows for streaming integration of data extracted from Web pages.This is a natural requirement in complex services based on Web wrapping.Finally, we discuss industrial applications of Lixto and point to open problems for future study.
| Year | Citations | |
|---|---|---|
Page 1
Page 1