Compustat database & non-US data

Based on a query from a student I investigated a problem getting output for some variables in the Compustat Global Fundamentals Annual database. I tried to get data for a selection of Dutch and UK companies for the variables: Long-Term Debt – Issuance (DLTIS), and Long-Term Debt – Reduction (DLTR).
No matter what I did, I just could not get data for these variables for the past 10 years.

Many of the regular variables (Total Assets, Revenue, etc.) were not a problem. I studied the manual but this also gave no indication of what I needed to do. Screening variables also did not seem to be causing the problem.

In the end I contacted the the Capital IQ company to ask what the solution was. It turns out that the two variables I was looking for had their origin in filings done with the SEC in America. If I needed data for Dutch or UK companies all I need do is load a list of Global Company Keys (GVKEYs) as a text file in Compustat North America Fundamentals Annual and download the data items there. This was not what I expected but it is good to know the data is available.

N.B.: In Compustat Global you need to download the ISO Currency code as well as the data etc. to know in what currency the data is made available in the database. The data from Compustat North America is reported in US Dollar.


Cleaning Datastream downloads

Using ISIN codes is very often a good choice when you download data from Datastream and want to combine it with other data. ISIN codes are also regularly available in other databases. For the Datastream downloads you will have to seperate the ISIN again from the code of the variable which (as a default) is presented after the ISIN code between brackets.

Because every ISIN code consists of 12 digits, seperating them in Excel is easy: use the function =left(). Example: =left(CELL; 12)

In Stata you can use the substring command substr() to do something similar.
Example: gen isin=substr(Code, 1, 12)

Seperating the code of a company from the code of the variable in Datastream is more difficult if the company codes have varying lengths. The Datastream code (Datastream mnemonic) for instance, is such a code. If you need to seperate this from the code of the variable you need to make use of the length of the variable code which remains the same. To do this we combine to Excel functions: =left() and =len(). Example: =left(Cell; len(Cell)-11)

In Stata a similar thing can be done using a combination of substr() and strlen(). Example:
gen dscd=substr(variable, 1 ,(strlen(variable)-11))

Another thing which you may find necessary is seperating the names of the variables from the names of the companies. The variable name is most easily seperated since the name is a fixed number of characters. Again similar functions using the length of the content of CELLS can be used. A different way of seperating both is using a combination of the Left() function combined with the function Find(). For most downloads from Datastream both items are combined but there is a common element seperating the two, namely: ” “.
Using this we can build the following formula: =left(CELL;(FIND(” – “;CELL)))

In Stata a similar option is available to split a combined variable. The function split can be used as follows: split Name, generate(deel) parse(” – “). This generates two new variables that both start with the name deel and are numbered, for example: deel1 deel2. Afterwards you rename the variables deel1 and deel2 to what you need.