Cleaning Datastream downloads

Using ISIN codes is very often a good choice when you download data from Datastream and want to combine it with other data. ISIN codes are also regularly available in other databases. For the Datastream downloads you will have to seperate the ISIN again from the code of the variable which (as a default) is presented after the ISIN code between brackets.

Because every ISIN code consists of 12 digits, seperating them in Excel is easy: use the function =left(). Example: =left(CELL; 12)

In Stata you can use the substring command substr() to do something similar.
Example: gen isin=substr(Code, 1, 12)

Seperating the code of a company from the code of the variable in Datastream is more difficult if the company codes have varying lengths. The Datastream code (Datastream mnemonic) for instance, is such a code. If you need to seperate this from the code of the variable you need to make use of the length of the variable code which remains the same. To do this we combine to Excel functions: =left() and =len(). Example: =left(Cell; len(Cell)-11)

In Stata a similar thing can be done using a combination of substr() and strlen(). Example:
gen dscd=substr(variable, 1 ,(strlen(variable)-11))

Another thing which you may find necessary is seperating the names of the variables from the names of the companies. The variable name is most easily seperated since the name is a fixed number of characters. Again similar functions using the length of the content of CELLS can be used. A different way of seperating both is using a combination of the Left() function combined with the function Find(). For most downloads from Datastream both items are combined but there is a common element seperating the two, namely: ” “.
Using this we can build the following formula: =left(CELL;(FIND(” – “;CELL)))

In Stata a similar option is available to split a combined variable. The function split can be used as follows: split Name, generate(deel) parse(” – “). This generates two new variables that both start with the name deel and are numbered, for example: deel1 deel2. Afterwards you rename the variables deel1 and deel2 to what you need.


%d bloggers like this: