Data quality | researchfinancial

Availbility of the Euro Stoxx 600

Posted on 01/12/2021 by 1infospec

Not too long ago a student asked me if a specific index was available and how to get data on it. In this specific case it concerned the Euro Stoxx 600. Initially, I thought of two options (that I have available) to get this type of data: FactSet and Capital IQ’s Compustat Global database.

Option 1: FactSet
This option seemed straightforward. Using the Charting option in the advanced interface (or using the Excel add-in) it is easy to get at the data:

If your research involves specific companies over time that are (or have been) part of the index, then the advanced interface allows you to do this using the Screening option. After selecting the screening option you choose Universal screening and click on the option “Starter Screen”. The default series that is shown is the S&P 500 (Index). To go on from there you remove this index: hover with the mouse over the index and click on the red button to Remove. Use the search bar to find “stoxx 600” and click the index option that is presented (from the list). Initially, the current Euro Stoxx 600 list of companies are presented:

By clicking the series in the Select part of the screen you can get the formula to appear in the search bar. By clicking the formula in the search bar it is possible to call up options that include changing the date, which makes it possible to call up an older (historical) list of constituents for this index:

It does, however, require not just access to FactSet, but also an additional subscription to have access to older constituent lists for Stoxx indexes. Another downside is, that it is not possible to get a list of constituents over a period of time. FactSet only allows you access to lists for specific points in time.

Option 2: Compustat Global
The database is available through the WRDS platform (this requires a separate subscription). The part database Index Prices allows you to search for and downloads index data on an index as a whole. At step two in the search screen it is possible to search for the index code using the option “Code Lookup“:

Looking for stoxx 600 will get you the Dow stoxx 600 code 150376. Using this code it is possible to download price data for this index.

Within the platform it is also possible to browse to the part database “Index Constituents“. The same “Code Lookup” option is available to find the same code: 150376. Using this code it is possible to download historical lists of the companies in this index going back to 2000:

On the face of it, the list from Compustat appears to be the Euro Stoxx 600. Unfortunately, this is not the case. The index was originally introduced in 1998 to represent a proxy of overall (stock) market developments in Europe. It is derived from the STOXX Europe Total Market Index (TMI) and is a subset of the STOXX Global 1800 Index. The joint venture Stoxx Limited provides the Stoxx indexes and was formed as a joint venture. In 2009 Dow Jones exited this joint venture and the Stoxx indexes were now renamed by removing the DJ/Dow Jones prefixes. This means that the Euro Stoxx 600 index available through Compustat may be very similar from 2000 until 2009 but after the demerger will likely deviate from the actual Stoxx 600 indices from Stoxx Ltd. The name in the download still indicates Stoxx Indices, however, which may cause confusion as to which index it is. At this point in time the actual indices from Stoxx Ltd are not available (anymore) through Capital IQ’s Compustat in WRDS.

Filed under: Compustat, Data quality, FactSet, Stock index | Comments Off on Availbility of the Euro Stoxx 600

FactSet IPO Offer Method field data

Posted on 20/07/2020 by 1infospec

The database has a dataset on IPO and IPO-related information. Below are the current definitions of offer method typings in the field for Offer method available in FactSet for IPO’s.

1) Bookbuild Issue
Unlike the U.S. and some other larger countries where the offer price is determined based on market conditions, a bookbuild issue is where the bookrunner collects bids from potential investors on how much they are willing to pay. After the bookbuilding process has ended, the bookrunner determines the price at which the shares will be sold.

2) Best Efforts
An underwriting arrangement in which an underwriter agrees only to use its best efforts to sell the securities as the issuer’s agent. The underwriter does not purchase the securities itself and has no obligation to purchase any amount not purchased by investors.

3) Firm Commitment
An underwriting in which an investment banking firm commits to buy and sell an entire issue of stock and assumes all financial responsibility for any unsold shares

4) Direct Listing
Company raises capital in a private placement to accredited investors as the initial step in financing. Amount of capital raised varies depending on sufficient capital needed for company’s growth plans and registration and listing costs before it achieves an exchange listing (typically, within six to twelve months after the private placement offering is completed).

5) Placing
Placing is the method by which a company lists its shares on a stock exchange and raises capital by selling shares to institutional investors. These shares are not generally marketed to retail investors at the time of the issue unless the correct approvals are included in the offer documentation.

6) Offer for Subscription
Offering where a minimum level of subscription for the shares, and if this is not reached the offer is withdrawn.

The second and third methods are both considered bookbuild issues, but there is a Bookbuild issue type separately indicated. So, how do those three compare, because these methods seem to overlap in method typing?
According to FactSet helpdesk the offering type is a field that they are looking to change. One of the reasons is that they agree that the offer methods are not distinct. Also, in a given deal, there can be multiple offering types used. FactSet is going to change this in the future. At the moment they are considering offering 3 new separate fields where information is made available on the offering type. The fields already exist at the backend but are currently not available through the FactSet database. The new fields will describe:

how the deal was priced
what was the involvement from the underwriters
and how were the shares sold/listed on the exchange?

Using the existing structure of the database FactSet needs to choose which offering method is the best of each IPO based on data collected in those three fields. The information on the Offer method may therefore deviate considerably from what is indicated in the company prospectus.

N.B.: For example, compare the Offer method in FactSet with the actual prospectus for Basic-Fit NV. The Prospectus indicates a combination of offer methods. The filings for companies in FactSet are quickly available through the Company / Security tab by looking up the company using the name or ticker BFIT

Filed under: Data quality, FactSet | Comments Off on FactSet IPO Offer Method field data

Excel and duplicates in a dataset

Posted on 12/09/2017 by 1infospec

In the past I posted items on duplicate data from datasets (from Compustat databases) and how to find this out using programs like stata. For smaller datasets the program Microsoft Excel can also be used to investigate your datasets. AT some point, when merging datasets from multiple sources it may happen that you get duplicate data. Using Excel functions like a pivot table (draaitabel) and vlookup (vertikaal zoeken) duplicate data can be detected as follows:

1) first you need to tell Excel that your dataset is a specific delineated table using the menu option: Insert > Table

2) When you insert this option you need to indicate the range and if there are any column headers

3) The same menu tab insert (invoegen) has the option to create a summary pivot table (samenvatten met draaitabel). Select this option at the top left corner:

4) Create the PivotTable in a new (or empty) sheet:

5) The empty Pivot Table will be shown as follows:

6) In this example I drag the fields I want to check for duplicates (records/observaions) down to rows (Rijen). At the box values I drag a random field (in this case indfmt). On the left side of the screen the result will be presented:

7) In this example I click the field gvkey and choose the option to change the field settings

8) At subtotals & filters mark the second option No(ne)

9) In the tab Format & Print (Indeling & Afdrukken) mark the option: item labels in table format (Itemlabels in tabelvorm weergeven). Also make sure to mark the option to repeat item labels (Itemlabels herhalen):

10) The result chould look as follows:

11) Copy the list to a new sheet and use the concatenate function (Tekst.samenvoegen) to create a combination of GVKEY and year:

12) Create a similar link list with the Concatenate function in the original Compustat datasheet

13) Now use the VLOOKUP function in Excel in the Compustat datasheet to look up the link in the sheet with the result of the Pivottable. Make sure that the VLOOKUP option says False (Onwaar) at the 4th option as follows:

14) Finally, using the filter option you can find out if there are duplicatesd by selecting everything for which the count is higher than 1. The tricky thing will be deciding what to do with the result.

N.B.: If a dataset has over 100.000 observations the process described above will take some time as Microsoft Excel will require significant processing power from the computer. For larger datasets I reccomend using Stata.

Filed under: Compustat, Data management, Data quality, Excel | Comments Off on Excel and duplicates in a dataset

Small error in S&P 500 Total Return data

Posted on 01/10/2015 by 1infospec

Recently a long time series was needed to use in some exercises and in this case the choice was made to use the the default closing Price (PI) and the Total Return (RI) data for the S&P 500 index. The source for the data was Datastream from Thomson Reuters. The price data from Datastream for the S&P 500 index goes back to December 31st of 1963. The Total Return data from Datastream for this index goes back to January 1st of 1988.

While working with the data a strange outlier in the data was discovered for January 24th 1990. Statistically speaking, the calculated return made a huge jump there. See the example here:

The data was then compared to data from Yahoo Finance (using the Quandl add-in to do a quick download to excel). Using the Yahoo Finance data to calculate the Total Return again this appeared to give a comparable result to the manually calculated RI using the Price data for the index from Datastream.
After contacting the Thomson Reuters helpdesk it became clear that there was a small error in their spreadsheet that is used to calculate the RI: for January 24th an incorrect price was used for the RI calculation: 324,17 instead of 330,26.

Unfortunately, Thomson Reuters will not change this small error for their calculated RI. If you need to go back and have to use the data from that time you might use a corrected RI for this day which could be: 354,53. This number is more in line with statistical variance for the return over a period of 25 year or more. See also:

This goes to show that it always pays off to check for outliers and (when in doubt) to contact the vendor/source and discuss possibilities.

Filed under: Data quality, Datastream | Comments Off on Small error in S&P 500 Total Return data

The Quandl Excel add-in

Posted on 19/12/2014 by 1infospec

In the past I have already mentioned the data initiative called Quandl. Through the website it is possible to find and download numercial datasets that people or organizations provide. Many of these datasets are provided free of charge. The data can be downloaded in the following formats: Excel, CSV, text, JSON or XML.

The way the data is handled (by Quandl) makes it easy to download it: when data from different sources is provided in different frequencies or for different time frames, it can still be downloaded in the same sheet at the same time! The frequencies and time frames for selected time series are automatically matched when downloading the data. The data is on all kinds of subjects: from large time series on a macroeconomic level to company data on a microeconomic level and both financial and non-financial data.

The creaters of Quandl also offer an Excel add-in which allows you to directly download the data into Excel. The current version is 2.5. This add-in can only be used when you have an Authetication Token which is provided upon free registration on the Quandl website. The installation of the add-in is simple: just like any Add-in through the Excel Options. Once it is installed (and the Authentication Token is added at the Preferences in the Quandl add-in) you can directly search and download data. Quandl offers an extensive help section with short and clear movies that explain how the Add-in works.

Over 500 sources are already available, offering millions of datasets. Some of these sources are:

IMF (website)
World Bank (website)
Eurostat (website)
Maddison project (website)
European Central Bank (website)
Central national banks
Federal Reserve Bank (US) (website)
Google Finance (website)

People who have registered can also upload their own datasets making it a great tool to share data. One of the people who has done this is the renowned Thomas Piketty (Author of Capital in the 21 century): Link to datasets.

Filed under: Data management, Data quality, General | Comments Off on The Quandl Excel add-in

Multiple CDS Series & Datastream

Posted on 29/07/2014 by 1infospec

A standard subscription to Datastream includes data from CDS series. In a previous blog item I mentioned that there are two default providers of CDS data available:

– CDS data from CMA (from 2004 – September 2010)

– CDS data from Thomson Reuters (from 2007 – now).

There are, of course more suppliers / providers of CDS time series data. In 2010 Mayordomo, Peña and Schwartz did some research and wrote an article where they compare some of these providers: “Are all Credit Default Swap Databases Equal?“. The authors concluded that the provider CMA was the best source of reliable CDS data and is less prone to causing deviations (like fat tails: see also the definition and background).
If your research covers a time period that makes it possible to use just one source, that is also preferable as you do not run the risk of additional deviations because of differing methodologies of the providers in gathering the data.

If the research at your organisation leaves you no way out, and you have to use the standard two Datastream sources with CDS data, you can combine the data from these two providers. In the earlier blog post I mentioned the SPLC function/formula which can be used to combine series. The SPLC formula can also be used to look up several series at once.

1) First you need to identify (or searc for) relevant CDS CMA or Thomson Reuters (TR) series.

2) Next the Excel function VLOOKUP (= Vertikaal zoeken) can be used to match CMA and TR series. You have to be careful here as there seem to be more different TR series (with different currencies) that match CMA series. It may be an idea to select the series of the same currencies before matchen them by Mnemonics (= Datastream codes). See example with a small selection of 5 Year CDS series using the Excel file that matches all series (see original blog item):

3) As the next step you can combine the matched series in Excel with the SPLC formula using the Concatenate function (= tekst samenvoegen). See example:

4) The final step is doing a Request Table search to lookup all combined series in one go. You have to copy the formulas in the series column and Paste them as values! As the SPLC formula includes the datatypes SM it is also not necessary to put a datatype in the column for Datatypes! An example Request table can be downloaded here and would look as follows:

N.B.: The example Request Table spreadsheet (above) has the American date-notation: mm-dd-yy. On your own computer you may have to change this the date-notation to the European version: dd-mm-yy (or dd-mm-jj).

Filed under: Data quality, Datastream, Excel | Comments Off on Multiple CDS Series & Datastream

Datastream US equity data quality

Posted on 05/03/2013 by 1infospec

I recently came across an article that raised the issue of the quality of Datastream data. Ince & Porter (2006) warn researchers for potential flaws in the coverage by Datastream. The authors compared the US coverage of equity data to that of CRSP over the period 1975 and 2002. They found that certainly in the older years a large number of equities are missing in Datastream. An example is that for the year 1975 20% of common equity issues (found in the CRSP database) were missing from Datastream.

The problem is more important for small caps (= lower 20% of stocks measured by market capitalisation). Datastream also suffers to some extent from survivorship bias. The authors suggest a number of methods that alleviate the problems somewhat. More details are available in the article below:

Ince, O.S. & R.B. Porter, 2006, “Individual equity return data from Thomson Datastream: handle with care!,” Journal of Financial Research 29(4), 463-479.

Filed under: CRSP, Data quality, Datastream | Comments Off on Datastream US equity data quality

IPO data and quality

Posted on 25/02/2013 by 1infospec

One of the most often used sources for data on Initial Public Offerings (IPO) and follow-ons is the databse Global New Issue(s) from Securities Data Company (= SDC, now part of Thomson Financial/Thomson Reuters). While looking for information on IPO’s I came across two websites that report on IPO data and data issues when dealing with data from SDC.

The first website is created by Jay R. Ritter, Professor of Finance at the Warrington College of the University of Florida. A number of historical data files or refferals are listed/offered that provide additional data or alternative sources. The oldest data available is from the Nineteen seventies.

The second website is created by Alexander Ljungqvist, Professor of the Stern School of Business at the New York University. He has compared data from IPO’s with data gathered from SDC Platinum’s Global New Issue database and reports on problems for shares outstanding, venture backing, and syndicate size.

Filed under: Data management, Data quality, SDC Platinum | Tagged: Data quality | Comments Off on IPO data and quality

researchfinancial

Recent Posts

Archives

Categories

Meta