Publication | Closed Access
Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems
34
Citations
9
References
2005
Year
Unknown Venue
EngineeringInformation ForensicsPattern MiningSemantic WebSoftware AnalysisKnowledge Discovery SystemsData ScienceData MiningManagementIntelligent Data AnalysisKnowledge EngineeringData IntegrationSemantic GraphsInformation DiscoveryKnowledge Discovery ProcessData ManagementData CreationData ModelingSynthetic Data SetsAnalysis SystemsKnowledge DiscoveryData PrivacyComputer ScienceData SecurityKnowledge BaseKnowledge Data EngineeringFrequent Pattern MiningAutomated ReasoningSoftware TestingBig Data
Information Discovery and Analysis Systems (IDAS) are designed to correlate multiple sources of data and use data mining techniques to identify potential significant events. Application domains for IDAS are numerous and include the emerging area of homeland security.Developing test cases for an IDAS requires background data sets into which hypothetical future scenarios can be overlaid. The IDAS can then be measured in terms of false positive and false negative error rates. Obtaining the test data sets can be an obstacle due to both privacy issues and also the time and cost associated with collecting a diverse set of data sources.In this paper, we give an overview of the design and architecture of an IDAS Data Set Generator (IDSG) that enables a fast and comprehensive test of an IDAS. The IDSG generates data using statistical and rule-based algorithms and also semantic graphs that represent interdependencies between attributes. A credit card transaction application is used to illustrate the approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1