Concepedia

Abstract

Unstructured data refers to information that doesn’t have a pre-defined data archetype. Unstructured information is typically textual data, but may also contain numerical data, and factual details. This results in data that is obscure, irregular and ambiguous, thus making it difficult to analyse using conventional computing means. Much of the data in the web, in the form of blogs, news, social media platforms is unstructured. But they serve as a potential vast source of information, if processed efficiently. In this paper, the basics of harnessing unstructured data from the web and the techniques to process it are discussed. The concepts of web crawling, text mining and natural language processing are discussed in brief, to give an outline of how web data is processed and analysed. Sentiment Analysis, which is a major aspect of present day NLP, is also described, along with issue of mining from Twitter, which has emerged as the most important data source for NLP in the recent past. The paper concludes with a brief outline of the use of web data mining and analysis, and the potential for future growth in the field.

References

YearCitations

Page 1