Publication | Closed Access
Approximate matching of persistent LExicon using search-engines for classifying Mobile app traffic
19
Citations
15
References
2016
Year
Unknown Venue
Mobile App TrafficApproximate MatchingSearch Engine OptimizationEngineeringIntelligent Information RetrievalSemantic WebCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsNetwork TrafficPersistent LexiconSearch TechnologySimilarity SearchKnowledge DiscoveryMobile ComputingComputer ScienceKeyword SearchSearch Engine DesignBusinessSearch Engine IndexingLinguisticsSemantic SimilarityDistributed Search Engine
We present AMPLES, Approximate Matching of Persistent LExicon using Search-Engines, to address the Mobile-Application-Identification (MApId) problem in network traffic at a per-flow granularity. We transform MApId into an information-retrieval problem where lexical similarity of short-text-documents is used as a metric for classification tasks. Specifically, a network-flow, observed at an intercept-point, is treated as a semi-structured-text-document and modified into a flow-query. This query is then run against a corpus of documents pre-indexed in a search-engine. Each index-document represents an application, and consists of distinguishable identifiers from the metadata-file and URL-strings found in the application's executable-archive. The search-engine acts as a kernel function, generating a score distribution vis-'a-vis the index-documents, to determine a match. This extends the scope of MApId to fuzzy-classification mapping a flow to a family of apps when the score distribution is spread-out. Through experiments over an emulator-generated test-dataset (400 K applications and 13.5 million flows), we obtain over 80% flow coverage and about 85% application coverage with low false-positives (4%) and nearly no false-negatives. We also validate our methodology over a real network trace. Most importantly, our methodology is platform agnostic, and subsumes previous studies, most of which focus solely on the application coverage.
| Year | Citations | |
|---|---|---|
Page 1
Page 1