Publication | Closed Access
Internet Data Analysis Methodology for Cyberterrorism Vocabulary Detection, Combining Techniques of Big Data Analytics, NLP and Semantic Web
18
Citations
8
References
2019
Year
EngineeringBig Data AnalyticsSemantic WebCorpus LinguisticsText MiningNatural Language ProcessingAnti-cyberterrorismInformation RetrievalData ScienceData MiningComputational LinguisticsData IntegrationComputational LexicologyKnowledge DiscoveryCyberterrorism Vocabulary DetectionTerminology ExtractionComputer ScienceParallel ProcessingThreat HuntingSemantic OntologiesCyber Threat IntelligenceCyberwarfareBig Data
This article presents a methodology for the analysis of data on the Internet, combining techniques of Big Data analytics, NLP and semantic web in order to find knowledge about large amounts of information on the web. To test the effectiveness of the proposed method, webpages about cyberterrorism were analyzed as a case study. The procedure implemented a genetic strategy in parallel, which integrates (Crawler to locate and download information from the web; to retrieve the vocabulary, using techniques of NLP (tokenization, stop word, TF, TFIDF), methods of stemming and synonyms). For the pursuit of knowledge was built a dataset through the description of a linguistic corpus with semantic ontologies, considering the characteristics of cyber-terrorism, which was analyzed with the algorithms, Random Forests (parallel), Boosting, SVM, neural network, K-nn and Bayes. The results reveal a percentage of the 95.62% accuracy in the detection of the vocabulary of cyber-terrorism, which were approved through cross validation, reaching 576% time savings with parallel processing.
| Year | Citations | |
|---|---|---|
Page 1
Page 1