Dealing with Outliers in datasets

When you are working with datasets it maye be necessary to trim or winsorize the data to remove odd or deviant numbers that are very different from the rest. Deviating numbers or values are often called Outliers. The defintion by Grubbs is: An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs.

One way to deal with Outliers is to Trim (= remove) data/numbers from the dataset to allow for more robust statistical analysis. Another way to deal with Outliers, is Winsorizing the data: a method of averaging that replaces the smallest and largest values with the observations closest to them. A typical Winsorizing strategy is to set all outliers to a specified percentile of the data.

The following Youtube movie explains Outliers very clearly:

If you need to deal with Outliers in a dataset you first need to find them and then you can decide to either Trim or Winsorize them. In a large dataset detecting Outliers is difficult but there are some ways this can be made easier using spreadsheet programs like Excel or SPSS. Below you can find two youtube movies for each program that shows you how to do this.

Microsoft Excel & Detecting Outliers:

Detecting Outliers using SPSS: