Publication | Closed Access
From region encoding to extended dewey: on efficient processing of XML twig pattern matching
270
Citations
19
References
2005
Year
Unknown Venue
EngineeringEfficient ProcessingSemantic WebText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsData IntegrationXml StructuringTwig PatternXml LibraryKnowledge DiscoveryComputer ScienceXml DatabaseXml LanguageExtended DeweyCombinatorial Pattern MatchingXml Twig PatternXml Querying
Finding all the occurrences of a twig pattern in an XML database is a core operation for efficient evaluation of XML queries. A number of algorithms have been proposed to process a twig query based on region encoding labeling scheme. While region encoding supports efficient determination of structural relationship between two elements, we observe that the information within a single label is very limited. In this paper, we propose a new labeling scheme, called extended Dewey. This is a powerful labeling scheme, since from the label of an element alone, we can derive all the elements names along the path from the root to the element. Based on extended Dewey, we design a novel holistic twig join algorithm, called TJFast. Unlike all previous algorithms based on region encoding, to answer a twig query, TJFast only needs to access the labels of the leaf query nodes. Through this, not only do we reduce disk access, but we also support the efficient evaluation of queries with wildcards in branching nodes, which is very difficult to be answered by algorithms based on region encoding. Finally, we report our experimental results to show that our algorithms are superior to previous approaches in terms of the number of elements scanned, the size of intermediate results and query performance.
| Year | Citations | |
|---|---|---|
Page 1
Page 1