Fiscal Years & matching data

When you download annual report data from Compustat and need to match this with similar type data from Datastream you need to be sure that when you combine data it is from the same fiscal years. In Compustat you can download the data with both the variables Fiscal Year (= fyear) and Fiscal Year End (= FYR).

In Datastream the database Worldscope provides the annual report data. The data is usually reported with the calendar year as column or row headers. This can be compared to the Fiscal year in Compustat. For the fiscal year end the variable WC05350 can be used.

The unique combination to combine data from these sources is then:

  • Compustat Global & Datastream:
    ISIN (or Sedol) + Fiscal year + Fiscal year end
  • Compustat North America & Datastream:
    Cusip 9 / Ticker + Fiscal year + Fiscal Year End

N.B.:

1) In Compustat Global you need to download the Currency code as well as the data etc. to know in what currency the data is made available in the database. The data from Compustat North America is reported in US Dollar. In Datastream you can choose to download your data in a specific currency (calculated using historical exchange rates according to Thomson)

2) Accounting Standard can be an important variable as data in databases can be from different statements as reported by a company (statutory reports, SEC filings, etc.). In Compustat this variable is: acctstd. In Datastream it is the variable: WC07536

Email

Compustat database & non-US data

Based on a query from a student I investigated a problem getting output for some variables in the Compustat Global Fundamentals Annual database. I tried to get data for a selection of Dutch and UK companies for the variables: Long-Term Debt – Issuance (DLTIS), and Long-Term Debt – Reduction (DLTR).
No matter what I did, I just could not get data for these variables for the past 10 years.

Many of the regular variables (Total Assets, Revenue, etc.) were not a problem. I studied the manual but this also gave no indication of what I needed to do. Screening variables also did not seem to be causing the problem.

In the end I contacted the the Capital IQ company to ask what the solution was. It turns out that the two variables I was looking for had their origin in filings done with the SEC in America. If I needed data for Dutch or UK companies all I need do is load a list of Global Company Keys (GVKEYs) as a text file in Compustat North America Fundamentals Annual and download the data items there. This was not what I expected but it is good to know the data is available.

N.B.: In Compustat Global you need to download the ISO Currency code as well as the data etc. to know in what currency the data is made available in the database. The data from Compustat North America is reported in US Dollar.

Email

Excel and duplicates in a dataset

In the past I posted items on duplicate data from datasets (from Compustat databases) and how to find this out using programs like stata. For smaller datasets the program Microsoft Excel can also be used to investigate your datasets. AT some point, when merging datasets from multiple sources it may happen that you get duplicate data. Using Excel functions like a pivot table (draaitabel) and vlookup (vertikaal zoeken) duplicate data can be detected as follows:

1) first you need to tell Excel that your dataset is a specific delineated table using the menu option: Insert > Table

2) When you insert this option you need to indicate the range and if there are any column headers

3) The same menu tab insert (invoegen) has the option to create a summary pivot table (samenvatten met draaitabel). Select this option at the top left corner:

4) Create the PivotTable in a new (or empty) sheet:

5) The empty Pivot Table will be shown as follows:

6) In this example I drag the fields I want to check for duplicates (records/observaions) down to rows (Rijen). At the box values I drag a random field (in this case indfmt). On the left side of the screen the result will be presented:

7) In this example I click the field gvkey and choose the option to change the field settings

8) At subtotals & filters mark the second option No(ne)

9) In the tab Format & Print (Indeling & Afdrukken) mark the option: item labels in table format (Itemlabels in tabelvorm weergeven). Also make sure to mark the option to repeat item labels (Itemlabels herhalen):

10) The result chould look as follows:

11) Copy the list to a new sheet and use the concatenate function (Tekst.samenvoegen) to create a combination of GVKEY and year:

12) Create a similar link list with the Concatenate function in the original Compustat datasheet

13) Now use the VLOOKUP function in Excel in the Compustat datasheet to look up the link in the sheet with the result of the Pivottable. Make sure that the VLOOKUP option says False (Onwaar) at the 4th option as follows:

14) Finally, using the filter option you can find out if there are duplicatesd by selecting everything for which the count is higher than 1. The tricky thing will be deciding what to do with the result.

N.B.: If a dataset has over 100.000 observations the process described above will take some time as Microsoft Excel will require significant processing power from the computer. For larger datasets I reccomend using Stata.

Email

Using Stata to count segments

At the end of March I got asked the question how to use Compustat North America segments data and get aggregated counts on business segments or geographic segments. The variable business segments was to be used as an indicator of diversity: how many different types of activity a company included in it’s activities. The Geographic segments was to be used as an indicator on how widespread these activities were geographically for each company.

Specific important commands that are needed:

generate year=year(datadate) > using this command you get a year which can be used to count instances of segments. This is only needed if no available year can be used (like fiscal year / fyear).

drop > using this command you delete all variables that are non-essential from the dataset

order gvkey year > this command sorts the dataset first on the gvkey (= global company key which uniqely identifies a company in any Compustat database) and then by year

duplicates drop > this command deletes any possible duplicate annual data. This is important as the count only involves unique segments

by gvkey year: egen segmentcount = count(sid) > this command generates a new variable (segmentcount) and gives it the value of the count of the segment id codes (SID) for each company and individual year.

To later combine the business segments count dataset with the geographical count dataset a unique ID (UID) is created to later merge the datasets again into a single dataset.

Overall the script (.do file) I created does three things:
1) It creates a new dataset with business counts
2) A dataset with Geographical counts is made
3) It merges both newly created datasets into a single dataset

Example script screenshot:

The example dataset with .do script file can be downloaded here.

Example result screenshot:

N.B.: In the .do file the location for all files is the U: drive. You may need to change the drive letter in the original script to (for instance) c: or H: to get it to run. Make sure both the script file and Stata dataset are in the same location.

Email

Total Q and Tobin’s Q

A new Compustat data source has become available to people who have access to Compustat databases through WRDS: Peters and Taylor Total Q. This new source provides data on firms’ “Total q” ratio and the replacement cost of firms’ intangible capital. Total q is an improved Tobin’s q proxy that includes intangible capital in the denominator, i.e., in the replacement cost of firms’ capital. Peters and Taylor estimate the replacement cost of firms’ intangible capital by accumulating past investments in R&D (Research and Development) and SG&A (Selling, General and Administrative Expenses). Background paper can be downloaded here.

Overview of the variables and names in the database:

datadate = Date
fyear = Fiscal Year
gvkey = GVKEY / Compustat unique company code
K_int = Firm’s intangible capital estimated replacement cost
K_int_Know = Firm’s knowledge capital replacement cost
K_int_offBS = Portion of K_int that doesn’t appear on firm’s balance sheet
K_int_Org = Firm’s organization capital replacement cost
q_tot = Total q


The database offers mainly data for companies which are included in the Compustat North America database. Most of these companies are American. The coverage is: 1950 – now.

Email

Working with Compustat Execucomp tenure data

Not too long ago I had a question from someone who was having trouble working with data from Compustat Execucomp. He wanted the yearly tenure for a specific group of people with the function of Chief Financial Officer. The research spanned a period of 2009-2014 (post-crisis). The data that was downloaded looked something like this:

Step 1: Data cleaning
One of the first steps to take in this case is to make sure to have the right kind of data to work with. In this case the columns H and I needed to be checked and cleaned. In column I you see the date when a person left as CFO working for the company. In this situation we see items like n/a where the data is unavailable and this means that the person still continues to work as CFO for the company. We first need to replace such values with the value 2014 for the last year of our research as we are looking for the tenure within the time frame 2009-2014. Any other years after 2014 can in this case also be replaced with 2014. You can use the search and replace function in Excel to do this step by step. Afterwards you can use the Filter option in Excel to check for weird data or outliers. In principle you have to check both columns with start year and left year to be sure there are no outliers (weird values).

In column H you see the year when a person joined the company. I am assuming that this was also the startyear for each person when he came to work as CFO at the company (I have not personally checked this). You see in the screenshot that not every year is seen as a numerical value: Excel shows little green triangle dots in the cells where it thinks the data is text. To ensure that a year is seen as a numerical value you can add a new column and use a trick to create numerical values in this column: devide cell by 1. See screenshot column J for the original data and column K for the new years. In the top left corner you see the “formula” you can copy downwards for all years.

Step 2: Calculate tenure for each year
In the example for this blog I only calculated the tenure for the final years of the research time frame (2013 and 2014). You can figure out the formulas for the other years. First I started calculating the CFO tenure for 2014. In this example I assume that if the startyear matches the lastyear someone has worked in this capacity for less then a full year making the tenure less then 1 and thus zero. In this case I get the right number of years of the tenure by substracting the startyear from the lastyear (= research year 2014). See example:

Now for the tenure of the previous year (2013) the If statement comes in handy to figure out the tenure for this year. The full formula is:
=IF(K3<=(L3-1);(P3-1);FALSE)

K3 = start year tenure within the research window (or before)
L3 = last year for the research window (2014)
P3 = tenure for the final year of the research window (= 2014)

The formula in essence does the following: if the previous year (in this case 2013) matches the start year (or is smaller), then the tenure is that of 2014 minus 1. If not then put the word False there. This last condition prevents outliers from causing problems. Screenshot:

The same formula can also be used for the previous years. All you need to do is change the formula for the right numbers. 2012 example: =IF(K3<=(L3-2);(P3-2);FALSE)

Step 3: Figure out the relevant years
This step is essentially not necessary as the filter option of excel is already available to make an annual selection by year but you then have to add the tenure year manually for each year after copying the relevant tenure data by year (to a new sheet).
The formulas in step two will provide a tenure of  0 (or more) as long as the end year for the tenure (within the time frame 2009-2014) is equal to (or higher) then the start year for the tenure (within the time frame or earlier). To know the tenure by year we create columns to show which tenure applies to what year. That allows us to use a filter in Excel to more easily get the relevant data where there is a tenure of more then zero. I created the columns N and O to get the tenure years for 2014 and 2013. The formula I used for 2014 is: =IF((P4>0);RIGHT($N$1;4);FALSE)
where: P4 = calculated number of tenure years for 2014. I have put the year in the name in the first cell at the top of the column (first) as the last 4 digits making it possible to use $N$1.
Example:

For 2013 all you need to change is the cell P4 into Q4 and the header $N$1 into $O$1. You also need to put the year in the name of the variable at the top of the column). Subsequent years work the same way.

Step 4: Filter the data for the relevant years
As the final step you can now use the standard filter option to copy the relevant data by year to a new sheet.

I would then also remove irrelevant data for other years which do not apply to the specific year I have filtered for. The end result would look something like this:

Email

Shortselling data – Supplemental

In the previous post I mentioned two databases that have data on Short Interest shares. The Compustat part database offered data but the search through the WRDS platform could crash because of an error. The error occurred when you selected items (at search step 3) like CIK codes, etc. This problem has now been fixed and the data can now be downloaded as per usual with all selected variables.

I have also had another look at the searches in Datastream for Short Interest data and I noticed the following: you need to be careful when select a download frequency. Usually the SID data is made available every few months or once a year. The report frequency of the data has been changing the last year. If you choose the Yearly frequency in Datastream you will not get data for every year. Only when you select the frequency Monthly do you see data appear for each year.

Email