Stata & date formats

The past few weeks I have been learning about and working with Stata. This program can handle a lot of data and uses commands to edit data or analyse it. There are many commands available and one command is very handy when it comes to changing date formats.
When you work with data and intend to do an event study using the Request Table option from Datastream you may have to change the date format for the event dates to something that Datastream understands (also depends on the date format used by the computer).

In Excel you can use the command =TEXT(A1,”dd-mm-yy”). The Dutch version looks similar: =TEKST(A1;”dd-mm-jj”)
Afterwards you still have to copy and the paste as values to get workable dates for Datastream.

Now if you are working with a dataset in Stata you can use a command to do a similar thing: format [variablename] %tdDD-NN-YY

The elements mean:
format = change the format of a variable
variablename = indicates the date/variable item you wish to change
%tdDD-NN-YY = indicates that the timedate variable should look like 17-02-99. DD = day with two digits, NN = month with two digits and YY = Year with two digits.

Example of dataset before the change:

Now I use the command to change the look of variable dateff: format dateff %tdDD-NN-YY. The result looks as follows:

If you look closely you see that “under water” the variable still looks as it is originally presented: the first date 20-08-07 still looks like 20aug2007. On the right side you also see that dates in Stata are actually variables of the type Integer = a number that can be written without a fractional component (Wikipedia).
Interesting sidenote: in Stata dates are actually integer numbers, just like in Excel. Where in Excel number 1 represents 1-1-1900 in Stata the number 1 represents 1-1-1960. Negative integer numbers in Stata represent older dates. When you work with older dates it is wiser to use 4-digit year formats to avoid confusion.

There are a lot more options to change the dateformat. Another example that may be useful for using with the Event Tool in the CRSP database is:

format variablename %tdCCYYNNDD

In this instance dates will be formatted with 4 digits for a year, for example: 19871022

If you want to know all the options for formatting dates you can use the Help command: help datetime display formats.

Important: If you export a formatted file from Stata to an Excel file you will lose the formatting for the dates you just did. If you wish to keep the dates with the (changed) formatting you need to export/save the file as a (delimited) text file using the command: export delimited filename.txt and then open the text file in Excel using the text wizard. Also remember to open the formatted date column as text to keep the formatted date in tact!

This example applies to Stata editions 13 and 14.


WRDS platform: major changes

This week some major changes were implemented in the Wharton Research Data Services (WRDS) platform. These changes mainly concern the search options for databases (step 2) and the selection options (step 3). In essence: no options were changed for the databases. Only the way you used to make a selection has changed and the way you make a selection of variables. These are the most recent (major) changes since July 2015.

The selection screen is now much more compact and straightforward with all the options immediately available and more intuitive:

The selection screen has undergone a much bigger change. Instead of scrolling downwards through the list of boxes with lists of variables, you can now scroll sideways through the lists of variables:

The other two steps in WRDS are usually: Step 1 (Selecting a time period), and Step 4 (Output options). No major changes seem to have happened there. If I notice any more major changes in the interface I will post them here.


New BETA platform: Wharton

The WRDS platform (Wharton Research Data Services) is a web-based interface that allows you to search through many licensed databases and makes it easier to download data from them. The learning curve on how to use the Wharton platform is minimal as all databases can be searched in a similar way. The main effort that you need to expend is: learning the details of individual databases and how the data is made available. Databases that can be searched through Wharton include: Compustat Global & North America, Execucomp, CRSP, Amadeus & Audit Analytics.

The past few weeks it has been made possible to view the new changed BETA version of the WRDS platform which looks much different from the current platform. The old interface looks as follows:

The new BETA WRDS interface looks very different:

Overall, the changes are mostly cosmetic. when you click through to a specific database you still get the same search options that are available in the previous / current version: 1) Select time frame, 2) Make a selection, 3) Choose variables, and 4) Select the output format and date format.

N.B.: The new BETA version can be viewed but as yet I do not know when the current version will be replaced. I expect it to be sometime during the summer or autumn of 2015.


Hedge Funds & Datastream

A short time ago I was asked about databases that have information on Hedge Funds (HF). HF are investment vehicles that use pooled funds to use investment strategies that generate high returns. Some of these funds are very aggressive and have a bad reputation. HF are not regulated in the same way as Mutual Funds which also use pooled funds to invest and generate returns. Mutual Funds (MF) are much more regulated and can be sold to the general Public. More information is available in text books and Investopedia & Wikipedia.

I know of only two databases that have information specifically on individual HF type organizations: Datastream and the CRSP Mutual Funds database. In both cases it possible to search on individual HF companies by name or code(s). In the case of Datastream HF and MF can be found in the Data Category Unit Trusts. Through the Criteria Search option (at the top left corner of the Navigator screen) you can find them:

The search options are limited but the most important options are name searches, and Market searches. In addition to this you can also do a quick search on the keyword “Hedge Fund” and this will also offer a filter option (on the left side of the screen) for Unit Trusts.

Another interesting option are the Constituent lists of Hedge Funds. These lists provide an easy way to get some country lists.

If you have a list of Lipper codes (ID codes) you can also upload a list of these in Datastream using the Excel option “Create list from range“. Lipper codes are created by Lipper, Inc. This company (nowadays) is a subsidiary of Thomson Reuters and provides provides mutual and hedge fund data.

LPID – Lipper Identification Code (= Lipper PermID)
Definition: Unique fund identifier code that Lipper assigns to each Share Class. The identifier code is form of 8 numeric’s which Datastream prefix with ‘LP’. Example LP40004324

When you have created your own selection of HF companies (using Market lists or manually) you can do a Static search to find out more about the type of HF and Focus:

Examples of available Datastream Data Types are:

ATYPE – Asset Type
The Lipper Asset Type item describes the primary asset of the fund, for example; Bond, Commodity, Equity, Mixed Asset, Money Market, Real Estate etc. As a general rule a fund will be placed according to its primary investment objective as stated by the promoter and given by the fund documents.

GFOCUS – Geographical Focus
The Lipper Geographical Focus item describes the primary countries or regions where the fund invests. As a general rule a fund must hold at least 50% of its assets in a country/region to be assigned the relevant attribute.

LGC – Lipper Global Classification
The Lipper Global Classification is a granular description of the fund investment strategy, built out from the Asset Type and Geographical Focus items to create homogeneous groups of funds with comparable investment objectives.

N.B.: More Background information on Lipper and the Classification Methodology are available on Wikipedia and the Lipper website.


Sustainable & Responsible Mutual Funds and other databases

In the past I have posted an item on sustainability funds from the US SIF website with some performance data. The Mutual Funds on the website lists Tickers and names for these funds. The tickers on that website are NASDAQ Tickers. Additional information on these funds can be downloaded from the CRSP Mutual Funds part database Funds summary using a text file of these tickers. At search step2 in the WRDS platform you can choose to upload a file containing mutual fund codes:

In WRDS at step 3 you can choose to download descriptive information. If the CRSP database does not contain enough information you can select items like 8-digit CUSIP of NCUSIP as part of your download. The NCSUIP is the 9-digit CUSIP.
Using these CUSIP code you can search for more information in other databases like Compustat North America and Datastream/Asset4 ESG/WorldScope. In Compustat you can download data by creating a text file with CUSIP codes. This will run fine.

Using CUSIP 9 codes in Datastream (and for databases through datastream, like ASSET4 ESG and Worldscope) requires that the codes are enhanced by adding the letter U in front of each code. This can be easily done using the Concatenate() function in Excel. See example:

Before uploading a list of these enhanced CUSIP 9 codes, make sure to copy and paste the list in Excel to make all these CUSIP codes values! Datastream cannot upload Excel formulas. When you need to upload and work with lists in Datastream you can select the codes using the mouse and then use the Datastream option: Create list(from range):


Going from CRSP & Compustat North America to Datastream

Making a selection of companies/data in different databases can be tricky. In some cases it helps if the owners of a database already did some work for you. In the case of CRSP it is easy to go from this database to Compustat (and the reverse) using the CRSP/Compustat Merged database. Also, in some cases it can be handy to use it to go from a selection in CRSP or Compustat North America and look up data in Datastream.

The important thing is, as usual, the company or listing identification codes. The import codes that you can work with in these databases are the following: GVKEY, PERMNO, PERMCO, CUSIP, Ticker, CIK.

GVKEY = unique company code from Compustat databases
PERMNO = unique listing code from CRSP databases
PERMCO = unique company level code from CRSP databases
CUSIP = unique code for listed companies from the CUSIP bureau
Ticker = Ticker Symbol for listed companies at exchanges
CIK = Central Index Key = unique code for companies registered at the SEC.

The merged database consists of different part databases that you can use to match one of these codes to the other. If you are interested in Company level information the part database you can use from the merged database is the Fundamentals Annual database.

You can start from a selection in Compustat and use a GVKEY list to get the PERMNO or PERMCO codes.
You can also start from a CRSP selection with PERMNO codes and get GVKEY codes.
If you need to go from CRSP to Datastream the merged database is a good way to get CUSIP 9 codes that allow you to search for data in Datastream.

In the WRDS search screen you also try getting a closer match using the Linking options available and the options at “Fiscal Period and Link Date Requirements“. You only need to work with this if the amount of codes in the result is significantly different from the input list. Also, if the original list that you use to search, is an older (or historical list) you may need these options to get matches. See screenshot:

At step three in WRDS you can select the codes you need in your output:

Once you have made your selection (maybe already including financial data) it is a good idea to use the download option at step 4, called: “tab-delimited text (*.txt)“. The downloaded text file can still be used fine in Excel and this ensures that you are in control where company codes are concerned. These codes need to stay in tact if you need to search in another database like Datastream with CUSIP 9 codes. Check another post about working with text files in Excel if you need more information on this.
To use CUSIP 9 codes in Datastream you have to add the letter “U” in front of each code: to make it a “local code” that Datastream understands better Alternative: use the text “WS:”). You can use the Concatenate() (= Tekst.Samenvoegen) function of Excel to do this:

If your starting point from CRSP is not only a list of PERMNO codes but dates as well, then you will have to use the VLOOKUP() (= Verticaal.Zoeken) to match original PERMNO codes with dates with the right CUSIP 9 codes. If your starting point is a file like this:

Using the list I removed duplicate PERMNO codes and then did a search in the merged database (Fundamentals Annual part database) to get the CUSIP 9 codes:

As the final step I got the CUSIP 9 codes from that download in Excel with the original list of PERMNO codes + dates using the Vlookup() function (example of this function):

When this was finished I could use the original dates with the needed CUSIP codes (after adding the letter U) in a Request Table search in Datastream to get data from there.

N.B.: You will probably lose some companies from the original file as the PERMNO may not always be matched perfectly using the merged database.


CRSP Mutual Funds & NASDAQ Tickers

Not long ago I got a question about the sustainable & responsible Mutual Funds that were listed on the US SIF website. Using the 145 NASDAQ tickers from the website I did a search (through Wharton/WRDS) in the Mutual Funds database Fund Summary of the Center for Research in Security Prices (CRSP). Data on a total of 138 funds was returned but unfortunately for only 108 funds the NASDAQ ticker was also returned as a data variable. 7 of the funds were not returned and also gave no result in the output.

This makes it impossible to easily combine the downloaded data with the data from the US SIF website for 37 of these funds. I contacted the CRSP helpdesk to find out why for 30 funds the tickers were not returned and why for 7 funds it might be missing.
The following situations may cause this result from the CRSP database:

  • The fund is not in the search result because it is not in the CRSP database
  • The fund is not in the search result because it is not connected to the NASDAQ ticker
  • The fund is in the search result but has no NASDAQ ticker because it was not connected to it for all the years in the database.

This means two things:

  1. To find the missing NASDAQ Tickers for the funds it may be an idea to change the approach of the research. First do an initial search with all the NASDAQ Tickers and change the scope of the search to include a larger number of years. For the older years the NASDAQ tickers may be available. Using the matching Fund Numbers from CRSP you can now do a second search to get the data you need for the original time frame of the research. This means some extra work using Excel or SPSS to get a matching list of NASDAQ tickers and CRSP Mutual Fund numbers.
  2. To find out if missing funds are really not in the database a manual search needs to be done using the Code lookup option.

The easiest way I found to start the research, is getting a list of CRSP fund numbers combined with the NASDAQ tickers using the historical DVD copy of the CRSP Mutual Funds database. With the installed CRSP Sift 4.2 software it is extremely easy to get a matching list. A search in this database can be done using a plain text file with NASDAQ tickers.

If tickers are missing from the result you get a specific pop-up message listing the missing tickers. You can copy these and save them separately. The Excel search result of 138 funds can later also be used to match the CRSP data with the US SIF data from the website.