Publication | Open Access
Information Extraction from the Long Tail
10
Citations
14
References
2020
Year
Unknown Venue
EngineeringKnowledge ExtractionCyber CrimeSemantic WebCorpus LinguisticsJournalismText MiningNatural Language ProcessingComputational Social ScienceInformation RetrievalData ScienceComputational LinguisticsLanguage StudiesContent AnalysisSocial Medium MiningCybercrimeOnline ForumsKnowledge DiscoveryComputer ScienceInformation ExtractionSocial ComputingCyber Threat IntelligenceData ExtractionLong Tail
In today's online forums and marketplaces cybercrime activity can often be found lurking in plain sight behind legitimate posts. Most popular criminology techniques are either manually intensive, and so do not scale well, or focus on statistical summaries across websites and can miss infrequent behaviour patterns. We present an inter-disciplinary (computer science, criminology and conservation science) socio-technical artificial intelligence (AI) approach to information extraction from the long tail of online forums around internet-facilitated illegal trades of endangered species. Our methodology is highly iterative, taking entities of interest (e.g. endangered plant species, suspects, locations) identified by a criminologist and using them to direct computer science tools including crawling, searching and information extraction over many steps until an acceptable resulting intelligence package is achieved. We evaluate our approach using two case study experiments, each based on a one-week duration criminology investigation (aided by conservation science experts) and evaluate both named entity (NE) directed graph visualization and Latent Dirichlet Allocation (LDA) topic modelling. NE directed graph visualization consistently outperforms topic modelling for discovering connected entities in the long tail of online forums and marketplaces.
| Year | Citations | |
|---|---|---|
Page 1
Page 1