Publication | Closed Access
Mathematical Formula Identification in PDF Documents
45
Citations
14
References
2011
Year
Unknown Venue
EngineeringMachine LearningText MiningNatural Language ProcessingImage AnalysisInformation RetrievalValidated NumericsPattern RecognitionText RecognitionComputational LinguisticsDocument EngineeringCharacter RecognitionApproximation TheoryPdf DocumentsOptical Character RecognitionMathematical ExpressionsComputer ScienceEmbedded Mathematical ExpressionsMathematical Formula IdentificationStatistical Pattern RecognitionDocument Processing
Recognizing mathematical expressions in PDF documents is a new and important field in document analysis. It is quite different from extracting mathematical expressions in image-based documents. In this paper, we propose a novel method by combining rule-based and learning-based methods to detect both isolated and embedded mathematical expressions in PDF documents. Moreover, various features of formulas, including geometric layout, character and context content, are used to adapt to a wide range of formula types. Experimental results show satisfactory performance of the proposed method. Furthermore, the method has been successfully incorporated into a commercial software package for large-scale Chinese e-Book production.
| Year | Citations | |
|---|---|---|
Page 1
Page 1