Using Datastream or IBES for forecast accuracy

Not long ago a student contacted me with regard to the subject of forecasts. Forecast accuracy is defined as the absolute difference between the consensus analysts’ forecasts and actual earnings per share divided by the firm share price at the beginning of the quarter. Differences between actual and forecast earnings can be considered “surprise data”.
His question concerned the variables that he got through Datastream & IBES and which variables should be used. The answer I came up with (with the help of the Thomson helpdesk) was the following:

Datastream has direct data types for the Surprise earnings but this is only available for companies that have quarterly data coverage. Important in this case is, that only the current values of these data types are held – no history is maintained. It is also very difficult to calculate the Surprise data manually because of Datstream padding function.
Regardless of whether you decide to use IBES or Datastream data types, you should not use data types from both sources in the same analysis! Only use IBES data types or only use Datastream data types.

To answer the questions on using actual EPS and EPS forecasts and prices the analysis could be done using the following IBES data types:
EPSI1YR = Earnings Per Share End Date of Quarterly Period INT1
EPSI1MN = Earnings Per Share Mean INT1
I0IND = EPS Last Rep Int Period Indicator
I0EPS = EPS Reported Interim EPS (INT1)

Using the output from IBES (through Datastream) you have to be careful when comparing EPS data as all data should be for the exact same quarter. See below for an example with colours to indicate the quarter data to be compared:


Bankscope cancelled and replaced

Bureau van Dijk (BvD) publishes the Bankscope database. It was announced recently that Fitch, the main provider of the Bankscope data, has terminated it’s contract with BvD. As a result, the availability of bank data will undergo some important changes, with possibly disruptive consequences.  BvD is currently working on a new Bankscope data product: Orbis Bank Focus. However, the coverage of the new database is limited at the moment: historical coverage in this new database is severely restricted, with up to only three years of data (Bankscope has ten (or more) years of history). BvD are prioritizing cross-sectional coverage, and are working on including recent & historical data for banks in different countries. The new product is obviously a serious restriction for academic research.

The current Bankscope database will remain available until the end of 2016, at which point the contract between Fitch and BvD will end, and BvD will need to take down the data. If this affects your research, I advise you to download data as soon as possible.


Filing size of 10-K reports

No too long ago a student asked me if it was possible to find out what the size was for specific filings of companies with the SEC. Specifically it concerned the 10-K filings which are the annual reports with a comprehensive summary of the companies financial health. As I understand it, the file size of these filings was to be used in an exercise and served as a proxy for financial reporting readability.

To find the original filings it is of course possible to use the EDGAR search option to find the original full-text filings and parts (including the XBRL filings):

Using the Central Index Key (CIK), Ticker, or Company name it is easy to find a specific company. Using the Filing Type search box you can narrow the search down to specific filings, like 10-K. This is often possible going back to 1993/1994 (depends on the company).

A specific 10-K filing overview in Edgar would look as follows:

If you need to do this for several hundreds of companies and multiple years it would take some time to collect the file size data. If you have access to the Audit Analytics database it is possible to get the 10-K file size for a large number of companies at once through the WRDS platform. Audit Analytics has a part database that is called Accelerated Filer.

Through this part database it is possible to get filings data (from 2000 to now) using Ticker lists or CIK lists. In the example output below I have put the following variables:

NAME = Company Name
FORM FKEY = Form name (10-K and 10-Q)
FISCAL YE = Fiscal Year End
FILE SIZE = Filing size

Example output using Excel to filter for just 10-K filings:

Important: If you look at the 2015 10-K form file size according to Audit Analytics for the company ADVANCE AUTO PARTS INC this is 17 MB. This corresponds (roughly) with the size of Complete submission file according to EDGAR: 17.748.770. The Complete submission file in Edgar includes not just text and html codes, but may also include pictures and any other file types (Excel files etc.).


Total Q and Tobin’s Q

A new Compustat data source has become available to people who have access to Compustat databases through WRDS: Peters and Taylor Total Q. This new source provides data on firms’ “Total q” ratio and the replacement cost of firms’ intangible capital. Total q is an improved Tobin’s q proxy that includes intangible capital in the denominator, i.e., in the replacement cost of firms’ capital. Peters and Taylor estimate the replacement cost of firms’ intangible capital by accumulating past investments in R&D (Research and Development) and SG&A (Selling, General and Administrative Expenses). Background paper can be downloaded here.

Overview of the variables and names in the database:

datadate = Date
fyear = Fiscal Year
gvkey = GVKEY / Compustat unique company code
K_int = Firm’s intangible capital estimated replacement cost
K_int_Know = Firm’s knowledge capital replacement cost
K_int_offBS = Portion of K_int that doesn’t appear on firm’s balance sheet
K_int_Org = Firm’s organization capital replacement cost
q_tot = Total q

The database offers mainly data for companies which are included in the Compustat North America database. Most of these companies are American. The coverage is: 1950 – now.


Working with Compustat Execucomp tenure data

Not too long ago I had a question from someone who was having trouble working with data from Compustat Execucomp. He wanted the yearly tenure for a specific group of people with the function of Chief Financial Officer. The research spanned a period of 2009-2014 (post-crisis). The data that was downloaded looked something like this:

Step 1: Data cleaning
One of the first steps to take in this case is to make sure to have the right kind of data to work with. In this case the columns H and I needed to be checked and cleaned. In column I you see the date when a person left as CFO working for the company. In this situation we see items like n/a where the data is unavailable and this means that the person still continues to work as CFO for the company. We first need to replace such values with the value 2014 for the last year of our research as we are looking for the tenure within the time frame 2009-2014. Any other years after 2014 can in this case also be replaced with 2014. You can use the search and replace function in Excel to do this step by step. Afterwards you can use the Filter option in Excel to check for weird data or outliers. In principle you have to check both columns with start year and left year to be sure there are no outliers (weird values).

In column H you see the year when a person joined the company. I am assuming that this was also the startyear for each person when he came to work as CFO at the company (I have not personally checked this). You see in the screenshot that not every year is seen as a numerical value: Excel shows little green triangle dots in the cells where it thinks the data is text. To ensure that a year is seen as a numerical value you can add a new column and use a trick to create numerical values in this column: devide cell by 1. See screenshot column J for the original data and column K for the new years. In the top left corner you see the “formula” you can copy downwards for all years.

Step 2: Calculate tenure for each year
In the example for this blog I only calculated the tenure for the final years of the research time frame (2013 and 2014). You can figure out the formulas for the other years. First I started calculating the CFO tenure for 2014. In this example I assume that if the startyear matches the lastyear someone has worked in this capacity for less then a full year making the tenure less then 1 and thus zero. In this case I get the right number of years of the tenure by substracting the startyear from the lastyear (= research year 2014). See example:

Now for the tenure of the previous year (2013) the If statement comes in handy to figure out the tenure for this year. The full formula is:

K3 = start year tenure within the research window (or before)
L3 = last year for the research window (2014)
P3 = tenure for the final year of the research window (= 2014)

The formula in essence does the following: if the previous year (in this case 2013) matches the start year (or is smaller), then the tenure is that of 2014 minus 1. If not then put the word False there. This last condition prevents outliers from causing problems. Screenshot:

The same formula can also be used for the previous years. All you need to do is change the formula for the right numbers. 2012 example: =IF(K3<=(L3-2);(P3-2);FALSE)

Step 3: Figure out the relevant years
This step is essentially not necessary as the filter option of excel is already available to make an annual selection by year but you then have to add the tenure year manually for each year after copying the relevant tenure data by year (to a new sheet).
The formulas in step two will provide a tenure of  0 (or more) as long as the end year for the tenure (within the time frame 2009-2014) is equal to (or higher) then the start year for the tenure (within the time frame or earlier). To know the tenure by year we create columns to show which tenure applies to what year. That allows us to use a filter in Excel to more easily get the relevant data where there is a tenure of more then zero. I created the columns N and O to get the tenure years for 2014 and 2013. The formula I used for 2014 is: =IF((P4>0);RIGHT($N$1;4);FALSE)
where: P4 = calculated number of tenure years for 2014. I have put the year in the name in the first cell at the top of the column (first) as the last 4 digits making it possible to use $N$1.

For 2013 all you need to change is the cell P4 into Q4 and the header $N$1 into $O$1. You also need to put the year in the name of the variable at the top of the column). Subsequent years work the same way.

Step 4: Filter the data for the relevant years
As the final step you can now use the standard filter option to copy the relevant data by year to a new sheet.

I would then also remove irrelevant data for other years which do not apply to the specific year I have filtered for. The end result would look something like this:


WorldScope coverage update 2nd Quarter 2016

WorldScope company records now cover annual reports data for 80497 companies. This includes 47564 active and 32933 inactive companies. This update: 578 companies were added. Because of these additions the number of active companies in the database has gone up 172 from 47392 in April. WorldScope company records are also available through Datastream and LexisNexis.

Today I have updated the WorldScope country coverage file and it now includes the latest update as it was posted in the second Thomson Reuters Infostream quarterly publication of 2016.

Major updated Countries (new records):
Australia (27)
China (44)
Hong Kong (28)
India (29)
Japan (29)
South Korea (44)
Poland (25)
Sweden (30)
Taiwan (34)
United Kingdom (23)
United States (137)


Shortselling data – Supplemental

In the previous post I mentioned two databases that have data on Short Interest shares. The Compustat part database offered data but the search through the WRDS platform could crash because of an error. The error occurred when you selected items (at search step 3) like CIK codes, etc. This problem has now been fixed and the data can now be downloaded as per usual with all selected variables.

I have also had another look at the searches in Datastream for Short Interest data and I noticed the following: you need to be careful when select a download frequency. Usually the SID data is made available every few months or once a year. The report frequency of the data has been changing the last year. If you choose the Yearly frequency in Datastream you will not get data for every year. Only when you select the frequency Monthly do you see data appear for each year.