Publication | Closed Access
Segmenting Webpage with Gomory-Hu Tree Based Clustering
23
Citations
15
References
2011
Year
Cluster ComputingEngineeringPlanar GraphUser SegmentationGraph ProcessingText MiningGomory-hu TreeInformation RetrievalData ScienceData MiningThedom TreeGraph DrawingWeb PageDocument ClusteringKnowledge DiscoveryComputer ScienceGraph AlgorithmWeb MiningNetwork ScienceGraph TheoryBusiness
We propose a novel web page segmentationalgorithm based on finding the Gomory-Hu tree in a planargraph. The algorithm firstly distills vision and structureinformation from a web page to construct a weightedundirected graph, whose vertices are the leaf nodes of theDOM tree and the edges represent the visible positionrelationship between vertices. Then it partitions the graphwith the Gomory-Hu tree based clustering algorithm.Experimental results show that, compared with VIPS andChakrabarti et al.’s graph theoretic algorithm, ouralgorithm improves upon the other two with much higherprecision and recall, and its running time is far lower thanthat of Chakrabarti et al.’s graph theoretic algorithm.
| Year | Citations | |
|---|---|---|
Page 1
Page 1