Changing Datastream data & Stata

The past few weeks I have been learning about and working with Stata. This program can handle a lot of data and uses commands to edit data or analyse it. A sequence of commands can be saved in .do files and then rerun as a script. There are many commands available and one of them is very handy when it comes to changing data from columns to rows. It is similar to the transpose option that Microsoft Excel offers for quick changes. It is the reshape command.

In this blog post I will use the reshape command to change Datastream data as an example. Similar work can be done for other downloads from databases like Amadeus or Bankscope.
For downloads it can take a bit of work to change the data and rework it before Stata can be used to merge it with other data. In the case of Datastream this has everything to do with the fact that Datastream does not repeat an ID for each year of data you download (unless using a Request Table search). Example screenshot original download from Datastream (transposed):

The following changes need to be made in Excel:

1) Get the ID back from the download without the extra Datastream codes. This can be done using the function MID(Cell,Start,Number) (= in Dutch versions of Excel: the Deel() function). MID() allows you to get part of the contents back from a Cell. Example:

2) Next you need to create column year headers that start with a text character and then the year of the column (this step is important for Stata later). It can be done smartly using a specific cell for the header and then combining it with the original year above each column as follows with the formula =($A$1&(B$1)). The dollar signs fix certain cells or a row (or column). Example:

3) Now save the Datastream Excel file without these formulas. If necessary use copy > paste special > Values

4) Next we start up Stata and get the file. In this case we prepared the Datastream file nicely and can therefore use the command: import excel using DS-Prepared.xlsx, firstrow
The firstrow option tells Stata that this row has the variable names and lables. Example:

5) When we look at the data with the browse command you see that the data looks basically the same as how it appeared in Excel:

Only two things are important here:

  • The data is not yet formatted as we need it: the numerical data are now strings (red) and need to be changed later
  • To later merge data the company ID’s need to be combined with the years

6) We now transpose the data using the command Reshape. This command has many options and allows you to rework tables. If you need more explanation on how it works you can type the command help reshape and Stata will provide much information on how to use the command. For this example the command is:

reshape long Y, i(CompanyCode) j(year)

In a nutshell this tells Stata that the years for the columns need to be repeated for the observations/records and can be found in the names of the variables. The ID for each company also needs te be repeated for the years of all observations. Example result:

When you use the command browse again you see the result of the data change:

7) We now need to create a unique ID combination to later merge the data. To do this we need to turn the years back from numerical values into strings using the tostring command: tostring year, replace. The final step is creating the unique ID combination UID with the command: gen UID = CompanyCode+year

8) Now there are a few steps left to finalize the Datastream data but not all of it is necessary. One thing is necessary: for Stata the numerical data is still a (text) string and now need to be changed. Use the destring command to as follows: destring Y, replace force

Destring also removes any values that were not numbers (but text) and replaces them with an empty cell (.) because of the option force (this can be dangerous if the content of a cell is a combination of a numerical value and text).

Optional stuff:

  • Use the drop command to remove unnecessary columns
  • Remove observations without data using, for example: drop if mi(Y)
  • The command order can be used to reorder data and put the UID variable in the first column
  • The command rename can be used to rename variables

The end result could look as follows:

To make stuff easier you could create a .do file that contains all the commands in sequence and this allows you to reuse it in similar situations for similar downloads. Here you can download an example .do file that shows examples of all the commands.

This example applies to Stata editions 13 and 14.

Email

WRDS platform: major changes

This week some major changes were implemented in the Wharton Research Data Services (WRDS) platform. These changes mainly concern the search options for databases (step 2) and the selection options (step 3). In essence: no options were changed for the databases. Only the way you used to make a selection has changed and the way you make a selection of variables. These are the most recent (major) changes since July 2015.

The selection screen is now much more compact and straightforward with all the options immediately available and more intuitive:

The selection screen has undergone a much bigger change. Instead of scrolling downwards through the list of boxes with lists of variables, you can now scroll sideways through the lists of variables:

The other two steps in WRDS are usually: Step 1 (Selecting a time period), and Step 4 (Output options). No major changes seem to have happened there. If I notice any more major changes in the interface I will post them here.

Email

Multi-year shareholder lists & Amadeus

In an earlier post I told something about the 2 most usual ways to get shareholder data in a list presentation in the database Amadeus. The first option was through the company full shareholder presentation reports. The second option was by changing the report option to get the shareholders at Year’s End. The third option to get multi-year shareholder data is through a specific list option. Shareholders data can be included in the list the usual way by clicking the Columns option in the middle of the screen (or the Add button) and then selecting the items you want. This will get you a list that looks as follows and only shows the “current” shareholders: If you want to get a multi-year overview you need to arrange this when selecting the data-items for the list presentation. After clicking the Columns option you can scroll to the shareholder items. Step 1: In this example I chose to include the shareholder name as the first item. When you mark the item it is included in the variable list on the right side. Step 2: Click the Modify button that is presented next to the variable name Step 3: A new pop-up screen appears. In this screen there are many options you can choose. Click the button at the top, called Archived Data. Step 4: In the screen that now appears it is possible to choose certain Months + Years that you want to see the archived data: up to 5 years are possible (not including the current option: Last available data). In this example I chose the last 5 years and December: Step 5: After clicking the OK button to conform the selection you will go back to the screen where you can select additional shareholder data items like % Direct Ownership and % Total ownership. When you select a multi-year item Amadeus will automatically include the required number of columns that was selected (at step 4): Step 6: After clicking OK to finish the selection you will get the following list presentation that now includes multi-year shareholder data: N.B.: When you need multi-year shareholder data and you select cetain months you need to be careful when inerpreting the result. In the example above I requested shareholder data for December of each year. This means that Amadeus will show the latest available shareholder data since the last selected month. This does not necessarily mean that the presented shareholder data is from the December month! If there were several reported months (for 1 year) with data for a shareholder, the lastest months’ data for that year will be presented: for instance if there is data for March, June and July, then in this case the data for July will be presented. I therefore recommend that you always include the items Information Source and Information Date. That way, you can always judge how recent the data is for each selected month and which entity reported the information.

Email

New BETA platform: Wharton

The WRDS platform (Wharton Research Data Services) is a web-based interface that allows you to search through many licensed databases and makes it easier to download data from them. The learning curve on how to use the Wharton platform is minimal as all databases can be searched in a similar way. The main effort that you need to expend is: learning the details of individual databases and how the data is made available. Databases that can be searched through Wharton include: Compustat Global & North America, Execucomp, CRSP, Amadeus & Audit Analytics.

The past few weeks it has been made possible to view the new changed BETA version of the WRDS platform which looks much different from the current platform. The old interface looks as follows:

The new BETA WRDS interface looks very different:

Overall, the changes are mostly cosmetic. when you click through to a specific database you still get the same search options that are available in the previous / current version: 1) Select time frame, 2) Make a selection, 3) Choose variables, and 4) Select the output format and date format.

N.B.: The new BETA version can be viewed but as yet I do not know when the current version will be replaced. I expect it to be sometime during the summer or autumn of 2015.

Email

Historical shareholder lists & Amadeus

A database like Amadeus (from Bureau van Dijk) has shareholder data that can go back to 2002. In a previous post I mentioned how you could find out how the types of shareholders could be identified using the codes. If your license / subscription includes the ownership module the information is available for European companies of the industrial type. Amadeus does not include financial type company data like Banks or Insurance companies. If you need these types of companies or a Global geographical coverage you could go for a database like Orbis or Bankscope.

The only way to view the full historical data for the current and previous shareholders for companies, is using the Ownership Data option that appears in the menu (on the right side of the screen) when you call up the basic Snapshot on a company from your results list by clicking on the name.

There are two presentation options for the Shareholder data:

1) History – all data
The first option presents a long list of all recorded data in the database. At the top the current shareholders are listed and below this the previous shareholders:

2) History – at each closing date
This option presents the recorded shareholders in columns by year. Allthough this makes it easier to view the history of ownership, if many changes occur in each year this makes interpreting the data not always easier:


Information on shareholders includes data like: type, percentage direct ownership, total percentage ownership, country, etc. Also included is information on Source Identity and Date of Info. Bureau van Dijk uses many sources for the Shareholder type data and the main sources include:

  • Annual Reports (RS or RM)
  • Private correspondence (PC)
  • SEC Filings (PX or NQ)
  • Stock exchanges (SE)
  • Company web-sites (WW)
  • Telephone calls (TC)
  • Press News (PN)
  • Additional sources (OS)
  • Information providers (IP)

The choice for the data listed as ownership (for each month) depends on the sources. The Ownership Guide also explains how the sources are used and also has a full list of the Information Providers used. This allows you to find out which Source Ident. codes concern what source for each month.

Email

Industry and economic classification systems

Many people who do research on companies and markets use a system like the SIC codes or NAICS codes. Both systems allow you to determine the activities for companies. The SIC code system is the oldest of the two and can be downloaded as a variable for companies in many databases. A company can get assigned one or more SIC codes. Sometimes a primary SIC code is given, which indicates the main activity. In essence, the Standard Industrial Classification (SIC) is a system for classifying industries by a four-digit code. The system was originally developed in the United States in 1937, and it is used by US government agencies to classify industry activities.  Basic information is available on Wikipedia.
If your research covers a long time frame it may be necessary to determine whether changes that occur over time in the SIC system have an effect. Luckily, an older version of the SIC code system is available. The U.S. Department of Labor, Occupational Safety & Health Administration website allows the user to search the 1987 version SIC codes.
A more recent version of the SIC system is available on the SEC website of the Division of Corporation Finance: Standard Industrial Classification (SIC) 2011 Code List.

The SIC system is also used by agencies in other countries. In the United ingdom they have developed their own version of the SIC codes. This United Kingdom Standard Industrial Classification of Economic Activities (UK SIC) is used to classify business establishments and other standard units by the type of economic activity in which they are engaged. The new version of these codes (SIC 2007) was adopted by the UK as from 1st January 2008. Older versions of the UK SIC system are also available online, specifically, the UK Standard Industrial Classifications 2007, 2003, and 1992 versions.

The North American Industry Classification System (NAICS) is the system used by US Federal statistical agencies for classifying businesses for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy. The NAICS system was developed under the auspices of the Office of Management and Budget (OMB), and adopted in 1997 to replace the Standard Industrial Classification (SIC) system. From 2002 the system is increasingly used. Different versions are available through the US Census website.

The NACE-code (Nomenclature générale des Activités économiques) is a code which is largely used in the European Union and its member states use it to classify commercial and non-commercial economic activities. It is mainly developed as a useful instrument when collecting data and publishing economic statistical overviews. Many companies and organizations exhibit a diverse range of activities. Companies only get assigned a single NACE code, however: the code that indicates the primary acrivity which contributes the most to the total added value of a company. The first version of the NACE was created sometime around 1970. The first revision was published in 1990 and was called NACE Rev. 1. The second major revision (NACE Rev. 1.1) took place in 2002. The second revision was intended to synchronize the system with the  “International Standard Industrial Classification of all economic activities” ISIC of the United Nations. The different NACE versions can be found on the Eurostat website. The Dutch national SBI code system (which replaced the original BIK code system) is based on the NACE system.

The International Standard Industrial Classification of all economic activities, abbreviated as ISIC, is a standard used by the United Nations Statistics Division (UNSD). The ISIC is used to classify economic activities so that entities can be classified according to the activity they carry out.
The ISIC classification combines the statistical units according to their character, technology, organisation and financing of production. The ISIC is used widely, both nationally and internationally, in classifying economic activity data in the fields of population, production, employment, gross domestic product and other economic activities. It is a basic tool for studying economic phenomena, fostering international comparability of data and for promoting the development of sound national statistical systems.
The current and older ISIC versions are available on the statistical website of the United Nations.

When you are using a database, always check the help information to find out what version of an industry code system is being used. If a code system is available but no information on what version it is, you should find out by contacting the owner of the database. Some databases collect data but gather only historical (industry code) information from other sources without using their own version of such as system. The SDC Platinum databases have historical SIC and NAICS codes as they were indicated by the sources that Thomson Reuters uses to collect data.

N.B.: In addition to the aforementioned industry codes there are many more code systems that were developed for statistical purposes in specific countries. The database Amadeus has an easy tool in the help section that allows you to translate specific codes to other codes. This tool is available in the Bureau Van Dijk version of Amadeus:

Email

Directors, Board members & Committees in Europe

One of the few databases that cover information on directors and board members as well as some information on senior management staff for European companies is Amadeus. The Amadeus database covers listed and unlisted production companies. No financial type companies are covered here. Part of the information can be called up through separate report-presentations. When you have created a list and click on one of the company names you get the default company snapshot presented. Only when the snapshot is presented, is it possible to make a selection for the specific snapshot/report on board members, directors and management staff data.

There are two versions of the management report:

  • Report (Current Directors / Managers / Contacts)
  • Report (Previous Directors / Managers / Contacts)    .

See example screenshot:

Saving these reports with all the data is possible but the limit for downloading the data is: 50 reports at once. If you need reports for more companies you will need to export repeatedly in batches of 50 (1-50, 51-100, etc.). The data for each company will be presented in a separate tab/sheet in the Excel download file. Much depends on the company, whether board/directors/managers data is available. The origin of the data is usually the annual report, the company website or data from other news sources. The column with the name/header Body, indicates in what company bodies the person is active. Examples are:

  • AdmDep (Administration Department)
  • AudC (Audit Committee)
  • BoD (Board of Directors)
  • CoGoC (Corporate Governance Committee)
  • HR (Human Resources dept.)
  • NomC (Nomination Committee)
  • OthBC (Other Board Committee)
  • R&D (Research & Development)
  • RemC (Remunaration Committee)
  • SenMan (Senior Management)
  • SupB (Supervisory Board).

Important: The Report presentation does not allow you to export all data (including small biographies). You will have to change the list presentation to do this. Click on the option Columns (above the list) to elect and unselect variables/data for the download. Example screenshot:

NB: The database REACH is a database that covers all registered companies at the Dutch Chamber of Commerce (listed and unlisted). Depending on your subscription a module on Board members & Management may be included. If available this allows you to download similar board and Management data (as described above) for Dutch companies.

Email

How to get a list of companies in trouble

In some cases it is interesting to investigate companies that have gone bankrupt or are in liquidation. For example: it may be interesting to have a look at the accounting report section of the annual report to see if there is a going concern paragraph in the years prior to company going bankrupt. Or it may be interesting to look at different financial items from the profit-los account or balance sheet, for instance the leverage ratio. The diference between both types of status are as follows:

Bankruptcy is a legal status of a person or other entity that cannot repay the debts it owes to creditors. In most jurisdictions, bankruptcy is imposed by a court order, often initiated by the debtor.” (Wikipedia)

Liquidation is the process by which a company (or part of a company) is brought to an end, and the assets and property of the company are redistributed. Liquidation is also sometimes referred to as winding-up or dissolution.” (Wikipedia)

Some databases offer variables that allow you to select these types of companies. In the European database Amadeus there is the option in the search Menu called Status. Clicking the option in the Menu allows you to select companies of this type. See example:

Compustat alo offers a variable that indicates whether a company is in trouble: Bankruptcy or Liquidation. The variable that indicates this is:

  •  STALT in the Fundamental Annual database (both Global and North America)
  •  STALTQ in the Fundamental quarterly database (both Global and North America).

For a company in trouble the variable will give as output: TL. See example:

Other financial databases may also include this type of information on a company. If you can not immediately ind it, try looking for variables with terms in the name like: status, active, inactive, bankrupt/bankruptcy, (in) liquidation.

Email

Shareholder types in BVD databases

Many of the databases of Bureau Van Dijk (BVD) allow you to call up information on the ownership of companies. When you call up a full report for a company in the database it is possible to go to a specific report with historical information on shareholders. Not only does it list the shareholders, but also percentages of ownership, month and year of the information, and type of ownership. Below you see an example screenshot for a company from the REACH database:


The type of shareholder is indicated in this overview with letters: A, B, C, etc. You can adjust the presentation of the type of information presented using the option Filters at the top of the list at the Options. You can also view a subsset of the available types of shareholders:

Unfortunately the list only shows letters and the filter option do not tell you which letters represent what types. If you do some further research in the online support/help information you can find out which letters stand for what types. Below you find an overview in the Dutch and English Languages:

Dutch Translation (used in REACH):
A = Verzekeringsmaatschappij
B = Bank
C = Handel- & Industrie-organisatie
D = Naamloze particuliere aandeelhouders, geaggregeerd
E = Beleggings- & pensioenfonds / Nominee / Trust / Trustee
F = Financiële instelling
I = Eén of meer bij naam genoemde individu(en) of familie(s)
J = Stichting / Onderzoeksinstituut
L = Ander naamloze aandeelhouders, geaggregeerd
M = Werknemers / Managers / Directeuren
P = Private Equity firma’s
S = Overheidsinstantie / Staat / Regering
V = Durfkapitaal
Y = Hedgefondsen
Z = Beursgenoteerde bedrijven

English Translation (used in Amadeus, Bankscope, etc.):
A = Insurance company
B = Bank
C = Trade & Industry organisation
D = Nameless private stockholders, aggregated
E = Mutual & Pension fund / Nominee / Trust / Trustee
F = Financial company
I = One or more named individuals or families
J = Foundation / Research Institute
L = Other named shareholders, aggregated
M = Employees/Managers/Directors
P = Private Equity firms
S = Public authority/State/Government
V = Venture Capital
Y = Hedge funds
Z = Public (Publicly listed companies)

MINT/ORBIS Additional categories:
H = Self ownership
Q = Branch

NB: In the help section it is also possible to get some definitions for some of these catagories (see the Orbis/Mint ownership specific ownership section).

Email

Bankscope and Consolidation codes

In the previous two posts I went into consolidation codes and how these can be used in Amadeus. Bankscope also uses consolidation codes to indicate the level of consolidation for the different financial statements. In a previous post I mentioned how the use of these codes might interfere with finding out how many banks you can actually get when you do a search. Limiting a search to companies with certain types of statements in Bankscope works similar to the Amadeus database: through the search menu. In the (advanced) search menu select: Accounting data > Consolidation code.

The types of consolidation codes available in Bankscope vary slightly from those in Amadeus. Below you see an overview of the codes and their explanation:

  • C1: statement of a mother bank integrating the statements of its controlled subsidiaries or branches with no unconsolidated companion
  • C2: statement of a mother bank integrating the statements of its controlled subsidiaries or branches with an unconsolidated companion
  • C* Additional Consolidated statement
  • U1: statement not integrating the statements of the possible controlled subsidiaries or branches of the concerned bank with no consolidated companion.
  • U2: statement not integrating the statements of the possible controlled subsidiaries or branches of the concerned bank with an consolidated companion.
  • U* Additional Unconsolidated statement

Additional companion statements (C* or U*)
The additional statements might differ from the main statements according to one or more of the following criteria:

  • The source used to spread the data is different (FDIC, SEC…)
  • The accounting standard (generally historical accounts in Local GAAP, more recent ones in IFRS are in the main statements)
  • Inflation adjusted vs nominal values
  • Proforma accounts vs original accounts

NB: When you are using both databases to select companies for your research you also have to take into account, of course, that similar codes in Amadeus and Bankscope may get you data, but the accounting rules for the data items from the Annual Accounts are different for different types of companies and therefore are difficult to compare between both databases. The same applies too to companies in a single database.

Email