Using Stata to count a sequence

Not long ago a student asked me how to calculate the tenure for auditing firms that were attached to firms within a certain time frame for up to a specific year. The student was interested in the effect of auditor tenure on companies. The subject of auditor rotation has become interesting as a research subject in the field of accountancy as in some countries firms are required to change their Auditor every few years to ensure auditor independence.

Using the Stata program this can be done using a script to identify “spells”. My colleague Matthijs de Zwaan helped me with this and created an example script based on an article in the Stata Journal.
The script I made was based on the example and can be adapted to count other kinds of sequences in datasets.  I also added a few lines of code to show how a dummy could be created to identify if the auditor in the example Excel file was a Big Four accounting firm or not. The script can be downloaded here and looks as follows:

The end result (after removing suplicate data) gives the tenure of the last auditor (latenure) for each firm at the year 2008:

N.B.: In the .do file the location for all files is the I: drive. You may need to change the drive letter in the original script to (for instance) c: or H: to get it to run. Make sure both the script file and Excel dataset are in the same location.

Email

Using Stata to count segments

At the end of March I got asked the question how to use Compustat North America segments data and get aggregated counts on business segments or geographic segments. The variable business segments was to be used as an indicator of diversity: how many different types of activity a company included in it’s activities. The Geographic segments was to be used as an indicator on how widespread these activities were geographically for each company.

Specific important commands that are needed:

generate year=year(datadate) > using this command you get a year which can be used to count instances of segments. This is only needed if no available year can be used (like fiscal year / fyear).

drop > using this command you delete all variables that are non-essential from the dataset

order gvkey year > this command sorts the dataset first on the gvkey (= global company key which uniqely identifies a company in any Compustat database) and then by year

duplicates drop > this command deletes any possible duplicate annual data. This is important as the count only involves unique segments

by gvkey year: egen segmentcount = count(sid) > this command generates a new variable (segmentcount) and gives it the value of the count of the segment id codes (SID) for each company and individual year.

To later combine the business segments count dataset with the geographical count dataset a unique ID (UID) is created to later merge the datasets again into a single dataset.

Overall the script (.do file) I created does three things:
1) It creates a new dataset with business counts
2) A dataset with Geographical counts is made
3) It merges both newly created datasets into a single dataset

Example script screenshot:

The example dataset with .do script file can be downloaded here.

Example result screenshot:

N.B.: In the .do file the location for all files is the U: drive. You may need to change the drive letter in the original script to (for instance) c: or H: to get it to run. Make sure both the script file and Stata dataset are in the same location.

Email