Using Compustat data with Audit Analytics

On a regular basis I get questions from people who want to get Auditor-related data from Audit Analytics. Audit Analytics is a database that has (among other things) Auditor data for companies that have to file reports with the SEC in America. This not only includes many American companies but also some foreign companies.
Very often the starting point for the research question, however, is a list of companies that was created using the Compustat North America databases (Compustat databases cover financial and stock related data). Through Compustat you can not only create lists more easily with direct searches, it is also possible to use the Consituents Index database to work with historical indexes or index-related lists from exchanges (like S&P 500, etc.).

The important thing to remember is: only two link options exist for data from both databases: Tickers or CIK codes. Tickers may change for companies over time, but CIK codes should not (unless the company itself changes or transforms due to M&A activity, for instance). In a previous blog post I mentioned some research that I did using historical tickers but it seems both Compustat and Audit Analytics keep their databases pretty much up to date where Tickers are concerned. I am, however, inclined to think that using CIK codes to search (and later, match) data is the safer option.

In both the Compustat North America Fundamentals Annual database as well as the Audit Analytics databases it is possible to upload/search using text files containing a list of CIK codes. This way you can find data on the same companies in both databases.

It is also important to not only use the same list of CIK codes to search for data, but you must also select the CIK code varable to get it returned in your output. That way, both the dataset/output from Compustat, and the dataset/output from Audit Analytics, can then later be matched again based on the unique combination of the CIK-codes with the respective Fiscal years. In the WRDS versions of Compustat and Audit Analytics the CIK variable can be found at Step 3 when you are selecting variables for the Output:

  • In Compustat the CIK variable can be found in the block Identifying Information. It is easily recognizable by name: CIK Number
  • Audit Analytics lists the CIK variable under the name “Company FKEY“. It can usually be found in a block of variables with company-related data variables

N.B.: if you need to match both datasets on the unique combination CIK code and fiscal year you can use the Excel functions:

  • Concatenate() (=Tekst.Samenvoegen) to combine the CIK codes and Fiscal Years
  • Vlookup() (= Vertikaal.Zoeken) to match datasets using the unique coes you created.

See examples in my previous posts for Datastream and Compustat.


Industry and economic classification systems

Many people who do research on companies and markets use a system like the SIC codes or NAICS codes. Both systems allow you to determine the activities for companies. The SIC code system is the oldest of the two and can be downloaded as a variable for companies in many databases. A company can get assigned one or more SIC codes. Sometimes a primary SIC code is given, which indicates the main activity. In essence, the Standard Industrial Classification (SIC) is a system for classifying industries by a four-digit code. The system was originally developed in the United States in 1937, and it is used by US government agencies to classify industry activities.  Basic information is available on Wikipedia.
If your research covers a long time frame it may be necessary to determine whether changes that occur over time in the SIC system have an effect. Luckily, an older version of the SIC code system is available. The U.S. Department of Labor, Occupational Safety & Health Administration website allows the user to search the 1987 version SIC codes.
A more recent version of the SIC system is available on the SEC website of the Division of Corporation Finance: Standard Industrial Classification (SIC) 2011 Code List.

The SIC system is also used by agencies in other countries. In the United ingdom they have developed their own version of the SIC codes. This United Kingdom Standard Industrial Classification of Economic Activities (UK SIC) is used to classify business establishments and other standard units by the type of economic activity in which they are engaged. The new version of these codes (SIC 2007) was adopted by the UK as from 1st January 2008. Older versions of the UK SIC system are also available online, specifically, the UK Standard Industrial Classifications 2007, 2003, and 1992 versions.

The North American Industry Classification System (NAICS) is the system used by US Federal statistical agencies for classifying businesses for the purpose of collecting, analyzing, and publishing statistical data related to the U.S. business economy. The NAICS system was developed under the auspices of the Office of Management and Budget (OMB), and adopted in 1997 to replace the Standard Industrial Classification (SIC) system. From 2002 the system is increasingly used. Different versions are available through the US Census website.

The NACE-code (Nomenclature générale des Activités économiques) is a code which is largely used in the European Union and its member states use it to classify commercial and non-commercial economic activities. It is mainly developed as a useful instrument when collecting data and publishing economic statistical overviews. Many companies and organizations exhibit a diverse range of activities. Companies only get assigned a single NACE code, however: the code that indicates the primary acrivity which contributes the most to the total added value of a company. The first version of the NACE was created sometime around 1970. The first revision was published in 1990 and was called NACE Rev. 1. The second major revision (NACE Rev. 1.1) took place in 2002. The second revision was intended to synchronize the system with the  “International Standard Industrial Classification of all economic activities” ISIC of the United Nations. The different NACE versions can be found on the Eurostat website. The Dutch national SBI code system (which replaced the original BIK code system) is based on the NACE system.

The International Standard Industrial Classification of all economic activities, abbreviated as ISIC, is a standard used by the United Nations Statistics Division (UNSD). The ISIC is used to classify economic activities so that entities can be classified according to the activity they carry out.
The ISIC classification combines the statistical units according to their character, technology, organisation and financing of production. The ISIC is used widely, both nationally and internationally, in classifying economic activity data in the fields of population, production, employment, gross domestic product and other economic activities. It is a basic tool for studying economic phenomena, fostering international comparability of data and for promoting the development of sound national statistical systems.
The current and older ISIC versions are available on the statistical website of the United Nations.

When you are using a database, always check the help information to find out what version of an industry code system is being used. If a code system is available but no information on what version it is, you should find out by contacting the owner of the database. Some databases collect data but gather only historical (industry code) information from other sources without using their own version of such as system. The SDC Platinum databases have historical SIC and NAICS codes as they were indicated by the sources that Thomson Reuters uses to collect data.

N.B.: In addition to the aforementioned industry codes there are many more code systems that were developed for statistical purposes in specific countries. The database Amadeus has an easy tool in the help section that allows you to translate specific codes to other codes. This tool is available in the Bureau Van Dijk version of Amadeus: