Not long ago I got the question on how to create dummy numbers for a dataset. The student who asked me this wanted to use these dummies in a regression analysis. In some models you may only want to have a single value of 1 or 0 but in this case the student wanted to create four different categories that are equal/similar in size. So, I used Excel to create the 4 groups. I am not sure if this makes sense from a mathematical point of view but I leave that up to the students.

First I created a small dataset with monthly price data for 11 randomly chosen Dutch stocks (fake example). To make the test even more challenging I made sure that there were also missing values because when you are working with larger datasets this might happen and you cannot always check the dataset for it. Below you see the example:

The example that I worked on needed to be divided in 4 groups and the groupings are different depending on the number of available stocks as well as the different price values. In Excel we can use the **QUARTILE()** function to let Excel calculate from a certain range where the groups should end. When the formula is used you need to indicate which range to use and the value of the Quartile: =QUARTILE(RANGE;VALUE). For the four Quartiles this could be

QUARTILE(B2:B11;1)

QUARTILE(B2:B11;2)

QUARTILE(B2:B11;3)

QUARTILE(B2:B11;4)

In the example screenshot these are Cells M2 to P2:

As the next step I needed to create a conditional statement for 2 scenarios:

- What to do when there is no data
- Which value (of 1,2,3,4) to show depending on the calculated Quartile.

In Excel I can do this with three functions:

**IF**(Condition;Then;Else) = *function to tell Excel what to do when*

**AND**(Condition1;Condition2; …) = *function to use more then 1 condition*

**LEN**(Cell) = *function to allow for missing values*

When I started out I begin by telling Excel what to do when data is missing, i.e. when LEN(Cell)=0. If I have no data in a Cell I indicate that the text should be presented: **NODATA**. I put these between Quotes to indicate to Excel that this is Text information.

The following steps indicate what value (of 1,2,3,4) should be shown depending on the comparison of the Price in a cell with the calculated 4 quartiles. In order for me to copy these formulas (downwards and to the right) I **locked** the columns M,N,O and P with the **dollar sign** before the letter of these four columns. The full formula looks as follows:

=IF(LEN(B3)=0;”NODATA”;IF(B3<=$M3;1;IF(AND(B3>$M3;B3<=$N3);2;IF(AND(B3>$N3;B3<=$O3);3;IF(AND(B3>$O3;B3<=$P3);4;””)))))

Here you see a screenshot of the result using the formula:

As a final step I did a quick check to see how many stocks would get values 1,2,3 or 4 using the **COUNTIF()** function. I need to create a column for each group, and therefore use COUNTIF(RANGE;VALUE) and get in the example:

=COUNTIF(B22:L22;1)

=COUNTIF(B22:L22;2)

=COUNTIF(B22:L22;3)

=COUNTIF(B22:L22;4)

It looks like you get a (roughly) equal number of Stocks with values 1-4. This depends of course on how many stocks you have and how many missing values for each stock there are. See example screenshot:

This is a small example of the functions that you can use in Excel and you can download the Excel sheet here. I also did a larger test with Daily Frequency data from 2011 to now for 23 stocks and I got a similar good result. You can download this larger example Excel Sheet here.

N.B.:* Sometimes the big conditional formula does not wotk as it should and you get values 1-4 when cells are empty. Be sure to check whether these seeming visually empty cells are actually empty. To solve this problem you can save the datasheet first as Text (Tab delimited) file, and the open it again in Excel using the Text wizard. When you retry the formulas from this example everything should then work fine. Be sure to always save a copy of the original download/dataset in case of errors*.

Filed under: Data management, Excel |