Publication | Open Access
Better Malware Ground Truth
86
Citations
51
References
2015
Year
Unknown Venue
Artificial IntelligenceMultiple Instance LearningEngineeringMachine LearningEvasion TechniqueInformation SecurityVerificationInformation ForensicsSoftware AnalysisBayesian InferenceMultiple Anti-virusHidden Ground TruthData ScienceData MiningPattern RecognitionStatisticsSupervised LearningKnowledge DiscoveryDistinct BinariesData PrivacyBayesian NetworkComputer ScienceData SecurityProgram AnalysisAnti-virus TechniqueStatistical InferenceMalware Analysis
We examine the problem of aggregating the results of multiple anti-virus (AV) vendors' detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTotal, each of which appeared for the first time between January 2012 and June 2014.
| Year | Citations | |
|---|---|---|
Page 1
Page 1