Publication | Closed Access
Processing Data for Outliers
866
Citations
4
References
1953
Year
Anomaly DetectionEngineeringExtraneous ObservationsCausal InferenceData ScienceData MiningManagementShifted MeanBig DataData ManagementStatisticsBehavioral SciencesOutlier DetectionKnowledge DiscoveryExperimental ProcedureExperiment DesignData Stream MiningStatistical InferenceData Modeling
Every experimenter has at some time or other faced the problem of whether certain of his observations properly belong in his presentation of measurements obtained. He must decide whether these observations are valid. If they are not valid the experimenter will wish to discard them or at least treat his data in a manner which will minimize their effect on his conclusions. Frequently interest in this topic arises only in the final stages of data processing. It is the author's view that a consideration of this sort is more properly made at the recording stage or perhaps at the stage of preliminary processing. I This problem will be discussed in terms of the following general models. We assume that observations are independently drawn from a particular distribution or alternatively, we assume that an observation is occasionally obtained from some other population and that there is nothing in the experimental situation to indicate that this has happened except what may be inferred from the observational reading itself.2 We assume that if no extraneous observations occur, the observations (or some transformation of them, such as logs) follow a normal distribution. We shall also assume that the occasional extraneous observations are either from a population with a shifted mean or from a population with the same mean and a larger variance. These assumptions may not be completely realistic but procedures developed for these alternatives should be helpful. If one is taking observations where either of these models apply there remain two distinct problems. First, one may attempt to pick out the particular observation or observations which are from the different populations. One may be interested in this selection either to decide that something has gone wrong with the experimental procedure resulting in this observation (in which case he will not wish to include the result) or that this observation gives an indication of some unusual occurrence which the investigator may wish to explore further.
| Year | Citations | |
|---|---|---|
Page 1
Page 1