Publication | Closed Access
WebKhoj
38
Citations
10
References
2006
Year
Unknown Venue
Natural Language ProcessingIndian Langauge ContentSearch TechnologySearch Engine OptimizationEngineeringInformation RetrievalCorpus LinguisticsComputational LinguisticsSearch EngineIndian Language ContentSearch Engine IndexingLanguage StudiesKeyword SearchSearch Engine DesignLinguisticsText Mining
Today web search engines provide the easiest way to reach information on the web. In this scenario, more than 95% of Indian language content on the web is not searchable due to multiple encodings of web pages.Most of these encodings are proprietary and hence need some kind of standardization for making the content accessible via a search engine. In this paper we present a search engine called WebKhoj which is capable of searching multi-script and multi-encoded Indian language content on the web. We describe a language focused crawler and the transcoding processes involved to achieve accessibility of Indian langauge content. In the end we report some of the experiments that were conducted along with results on Indian language web content.
| Year | Citations | |
|---|---|---|
Page 1
Page 1