Publication | Closed Access
Is sampled data sufficient for anomaly detection?
208
Citations
18
References
2006
Year
Unknown Venue
Internet Traffic AnalysisAnomaly DetectionEngineeringHardware SecurityData ScienceData MiningData ManagementStatisticsNetwork FlowsEffective Anomaly DetectionIntrusion Detection SystemOutlier DetectionKnowledge DiscoveryComputer ScienceData SufficientVolume AnomaliesNovelty DetectionNetwork Traffic MeasurementNetwork Monitoring
Sampling techniques are widely used for traffic measurements at high link speed to conserve router resources. Traditionally, sampled traffic data is used for network management tasks such as traffic matrix estimations, but recently it has also been used in numerous anomaly detection algorithms, as security analysis becomes increasingly critical for network providers. While the impact of sampling on traffic engineering metrics such as flow size and mean rate is well studied, its impact on anomaly detection remains an open question.This paper presents a comprehensive study on whether existing sampling techniques distort traffic features critical for effective anomaly detection. We sampled packet traces captured from a Tier-1 IP-backbone using four popular methods: random packet sampling, random flow sampling, smart sampling, and sample-and-hold. The sampled data is then used as input to detect two common classes of anomalies: volume anomalies and port scans. Since it is infeasible to enumerate all existing solutions, we study three representative algorithms: a wavelet-based volume anomaly detection and two portscan detection algorithms based on hypotheses testing. Our results show that all the four sampling methods introduce fundamental bias that degrades the performance of the three detection schemes, however the degradation curves are very different. We also identify the traffic features critical for anomaly detection and analyze how they are affected by sampling. Our work demonstrates the need for better measurement techniques, since anomaly detection operates on a drastically different information region, which is often overlooked by existing traffic accounting methods that target heavy-hitters.
| Year | Citations | |
|---|---|---|
Page 1
Page 1