Using Excel to randomize a selection

In statistics, a simple random sample is a subset of individuals (a sample or selection) chosen from a larger set (a population/dataset). Each individual is chosen randomly and entirely by chance. This way, each item  has the same probability of being chosen at any stage during the sampling or selection process (see also Wikipedia).
When you are doing research it may become necessary to make such a random selection from a dataset. The initial dataset that you created doing a search in a database may (for instance) result in too many companies/records. To avoid a possible bias that would be created by a manual selection from the dataset, you therefore need to find a way to make a random selection. Excel has two functions that allow you to do just that:

RAND
This function returns a random number greater than or equal to 0 and less than 1, evenly distributed (changes on recalculation). To use this function:

  1. Insert a column in front of the first column of the dataset in Excel
  2. In the cell A2 type in =RAND()
  3. Copy this downward for the entire dataset
  4. When necessary: insert a new column and use Copy (Special) to change the random number into values to make a selection.

Example 1:

RANDBETWEEN
This function returns a random number between the numbers you specify.

  1. Insert a column in front of the first column of the dataset in Excel
  2. Find out how many rows / companies you have
  3. In the cell A2 type in =RANDBETWEEN(First,Last). First stands for the first record/company (in the second example: 1) and Last stands for the last record/company in the dataset (in the second example: 5291)
  4. Copy this downward for the entire dataset
  5. When necessary: insert a new column and use Copy (Special) to change the random number into values to make a selection.

Example 2:

After using RAND or RANDBETWEEN to generate random numbers for each row (= record / company) in a dataset you can then use the random numbers to make the selection. After using RAND, for instance, you can select randomly by choosing only records with numbers smaller than 0,50 (or larger). If you chose to use RANDBETWEEN you can make the selection random by sorting the dataset (on the column with the random number) and choosing the top 50 or 100 numbers.

Email

%d bloggers like this: