Using Stata to count segments

At the end of March I got asked the question how to use Compustat North America segments data and get aggregated counts on business segments or geographic segments. The variable business segments was to be used as an indicator of diversity: how many different types of activity a company included in it’s activities. The Geographic segments was to be used as an indicator on how widespread these activities were geographically for each company.

Specific important commands that are needed:

generate year=year(datadate) > using this command you get a year which can be used to count instances of segments. This is only needed if no available year can be used (like fiscal year / fyear).

drop > using this command you delete all variables that are non-essential from the dataset

order gvkey year > this command sorts the dataset first on the gvkey (= global company key which uniqely identifies a company in any Compustat database) and then by year

duplicates drop > this command deletes any possible duplicate annual data. This is important as the count only involves unique segments

by gvkey year: egen segmentcount = count(sid) > this command generates a new variable (segmentcount) and gives it the value of the count of the segment id codes (SID) for each company and individual year.

To later combine the business segments count dataset with the geographical count dataset a unique ID (UID) is created to later merge the datasets again into a single dataset.

Overall the script (.do file) I created does three things:
1) It creates a new dataset with business counts
2) A dataset with Geographical counts is made
3) It merges both newly created datasets into a single dataset

Example script screenshot:

The example dataset with .do script file can be downloaded here.

Example result screenshot:

N.B.: In the .do file the location for all files is the U: drive. You may need to change the drive letter in the original script to (for instance) c: or H: to get it to run. Make sure both the script file and Stata dataset are in the same location.

Email

%d bloggers like this: