Operating Expense data in Compustat databases

Recently I got a question on Operating Expense data as it is available in Compustat databases. Operating Expense is “is an ongoing cost for running a product, business, or system . . . In business, an operating expense is a day-to-day expense such as sales and administration, or research & development, as opposed to production, costs, and pricing. It is the money the business spends in order to turn inventory into throughput. On an income statement, “operating expenses” is the sum of a business’s operating expenses for a period of time, such as a month or year.” See also WIkipedia.

In Compustat databases the variable Operating Expense is usually is an aggregate of several items of the income statement. Not all individual parts are reported also in separate variables, though. If you need to know which items have been included in the Operating Expense item you can download the Income Statement Model Number (ISMOD) with your data. The 2-character numeric code that you get indicates how the Operating Expense is reported according to internationally accepted formats. Details taken from the Compustat Online Manual:

Compustat (Global) has four Income Statement models (formats) for industrial companies. These models represent the most common methods an international industrial company may use to present data, and some models are more commonly used in some countries than others. This enables you to see trends within countries so you can use a particular model of presentation. To illustrate, most U.S. companies use the Cost of Sales format (Income Statement Model 1), while German companies use the Purchase or Production format (Income Statement Model 2), and most Hong Kong companies do not break-out their Expense components on their Income Statements (Income Statement Model 3). The models do not represent particular countries but do represent represent different but internationally accepted formats for reporting the Operating Expense.
The model may apply to one or more company-level items. The following table lists Income Statement Model Number codes and their descriptions:

Income Statement Model Code: 01
Description: Cost of Sales Format
The variable Operating Expense (Total) is a summation of the following 3 items:

  • Cost of Goods Sold (COGS)
  • Selling, General and Administrative Expense (XSGA)
  • Operating Expense (Total) Other (XOPRO)

For Canada & United States the Operating Expense (Total) Other contains a Not Available data code

Income Statement Model Code: 02
Description: Purchase or Production Format
The variable Operating Expense (Total) is a summation of the following items:

  • Raw Materials, Supplies and Merchandise (RAWMSM)
  • Staff Expense (Total) (XSTF)
  • Operating Expense (Other) (XOPRO)


  • Capitalized Costs (CAPCST)
  • Change in Stocks (STKCH)

Income Statement Model Code: 03
Description: Not specified format
Either the components of the Operating Expense (Total) are unclassified or operating expenses are presented as a total with no detailed breakdown.

Income Statement Model Code: 04
Description: Calculated format
Standard & Poor’s calculates a figure for Operating Expense from available information, but components contain a Not Available (NA) data code.

Income Statement Model Code: 05
Description: Hybrid Format
Company does not report operating revenues or expenses, but does include operating income on the Income Statement. Standard & Poor’s uses the Hybrid model for interim data due to more limited breakouts during interim periods. When the Income Statement Model Number is 05, all items in Income Statement Model Numbers 01 and 02 may be used.


Sustainable & Responsible Mutual Funds and other databases

In the past I have posted an item on sustainability funds from the US SIF website with some performance data. The Mutual Funds on the website lists Tickers and names for these funds. The tickers on that website are NASDAQ Tickers. Additional information on these funds can be downloaded from the CRSP Mutual Funds part database Funds summary using a text file of these tickers. At search step2 in the WRDS platform you can choose to upload a file containing mutual fund codes:

In WRDS at step 3 you can choose to download descriptive information. If the CRSP database does not contain enough information you can select items like 8-digit CUSIP of NCUSIP as part of your download. The NCSUIP is the 9-digit CUSIP.
Using these CUSIP code you can search for more information in other databases like Compustat North America and Datastream/Asset4 ESG/WorldScope. In Compustat you can download data by creating a text file with CUSIP codes. This will run fine.

Using CUSIP 9 codes in Datastream (and for databases through datastream, like ASSET4 ESG and Worldscope) requires that the codes are enhanced by adding the letter U in front of each code. This can be easily done using the Concatenate() function in Excel. See example:

Before uploading a list of these enhanced CUSIP 9 codes, make sure to copy and paste the list in Excel to make all these CUSIP codes values! Datastream cannot upload Excel formulas. When you need to upload and work with lists in Datastream you can select the codes using the mouse and then use the Datastream option: Create list(from range):


Creating Dummy items with data using Excel

Not long ago I got the question on how to create dummy numbers for a dataset. The student who asked me this wanted to use these dummies in a regression analysis. In some models you may only want to have a single value of 1 or 0 but in this case the student wanted to create four different categories that are equal/similar in size. So, I used Excel to create the 4 groups. I am not sure if this makes sense from a mathematical point of view but I leave that up to the students.

First I created a small dataset with monthly price data for 11 randomly chosen Dutch stocks (fake example). To make the test even more challenging I made sure that there were also missing values because when you are working with larger datasets this might happen and you cannot always check the dataset for it. Below you see the example:

The example that I worked on needed to be divided in 4 groups and the groupings are different depending on the number of available stocks as well as the different price values. In Excel we can use the QUARTILE() function to let Excel calculate from a certain range where the groups should end. When the formula is used you need to indicate which range to use and the value of the Quartile: =QUARTILE(RANGE;VALUE). For the four Quartiles this could be

In the example screenshot these are Cells M2 to P2:

As the next step I needed to create a conditional statement for 2 scenarios:

  • What to do when there is no data
  • Which value (of 1,2,3,4) to show depending on the calculated Quartile.

In Excel I can do this with three functions:
IF(Condition;Then;Else) = function to tell Excel what to do when
AND(Condition1;Condition2; …) = function to use more then 1 condition
LEN(Cell) = function to allow for missing values

When I started out I begin by telling Excel what to do when data is missing, i.e. when LEN(Cell)=0. If I have no data in a Cell I indicate that the text should be presented: NODATA. I put these between Quotes to indicate to Excel that this is Text information.
The following steps indicate what value (of 1,2,3,4) should be shown depending on the comparison of the Price in a cell with the calculated 4 quartiles. In order for me to copy these formulas (downwards and to the right) I locked the columns M,N,O and P with the dollar sign before the letter of these four columns. The full formula looks as follows:


Here you see a screenshot of the result using the formula:

As a final step I did a quick check to see how many stocks would get values 1,2,3 or 4 using the COUNTIF() function. I need to create a column for each group, and therefore use COUNTIF(RANGE;VALUE) and get in the example:

It looks like you get a (roughly) equal number of Stocks with values 1-4. This depends of course on how many stocks you have and how many missing values for each stock there are. See example screenshot:

This is a small example of the functions that you can use in Excel and you can download the Excel sheet here. I also did a larger test with Daily Frequency data from 2011 to now for 23 stocks and I got a similar good result. You can download this larger example Excel Sheet here.

N.B.: Sometimes the big conditional formula does not wotk as it should and you get values 1-4 when cells are empty. Be sure to check whether these seeming visually empty cells are actually empty. To solve this problem you can save the datasheet first as Text (Tab delimited) file, and the open it again in Excel using the Text wizard. When you retry the formulas from this example everything should then work fine. Be sure to always save a copy of the original download/dataset in case of errors.