Publication | Closed Access
Page-level template detection via isotonic smoothing
90
Citations
25
References
2007
Year
Unknown Venue
EngineeringFeature DetectionMachine LearningTheautomatic GenerationText MiningNatural Language ProcessingImage AnalysisInformation RetrievalData ScienceData MiningPattern RecognitionEdge DetectionComputational GeometrySupervised LearningTemplateness ScoreGeometric ModelingNovel FrameworkMachine VisionAutomatic ClassificationIsotonic SmoothingComputer ScienceMedical Image ComputingNatural SciencesDocument Processing
We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is theautomatic generation of training data for a classifier that, given apage, assigns a templateness score to every DOM node of the page. The second is the global smoothing of these per-node classifier scores bysolving a regularized isotonic regression problem; the latter follows from a simple yet powerful abstraction of templateness on a page. Our extensive experiments on human-labeled test data show that our approachdetects templates effectively.
| Year | Citations | |
|---|---|---|
Page 1
Page 1