Stata & large downloads from Datastream

In the past I demonstrated how it was possible to change a dataset from a wide presentation (= data in columns) to a long presentation (= data in rows) using the reshape command in Stata. This option works fine where smaller datasets are concerned and if there is no choice when downloading data. For larger datasets, however, it may become a problem to download the data in wide format and also the Stata software may have difficulty changing the presentation.

If you download, for instance, daily price data from Datastream for many years (using ISIN codes) for x number of listed companies the wide download format (transposed) is not a good option. Try downloading everything without transposing and you get everything in long format as follows:

Changing the data to a proper long format for Stata goes as follows:

1) Make a copy of the original data and add a new row at the top.
2) Above the Dates in column A type in a general name like “company” in Cell A1.
3) Also use numbers for each individual company at the top.
4) Insert a new row above the data and give the name Date in Column A above the actual dates
5) In the cells of this same row use an Excel formula to generate fake names. Example: =$A$1&B1
The copied Sheet should look as follows:

6) Now also make a copy of the first 3 rows and paste the data transposed using the Right-click option in a new sheet. Example:

7) Use the Excel function =left() to get the ISIN codes (12 digits) in a separate column or use Search and replace to get rid of the variable indicator (P)


8) Copy and paste this data as values to a new sheet. It should look as follows:9) Make a new copy of the first changed sheet and delete the first few rows. Make sure to copy & paste as value (right-click) to remove any formulas. The sheet should now look as follows:

Now start up Stata en use the sheets in the Excel Worksheet step by step:

10) First you import the sheet (from step 9) with the price data from Excel using the command import (or the Menu option File > Import > Excel spreadsheet).

11) Use the reshape command the rearrange the data to long format: reshape long company, i(Date) j(price). Change the variable names and make sure that the column with numbers is called company. Next save this as a Stata database.

12) Use clear all to start with the second sheet. Now import the second sheet with the company ISIN codes from step 8.

13) Use the merge command to merge this data with the Stata database we created at step 11). The command would look as follows:
merge 1:m company ” c:\ … \filname.dta”, sheet(…), firstrow


14) now do some data curation steps to change prices to numbers (destring command) and generate a newly formatted date. Make sure to save this file.


As an example I provide here a zip-file with an example download in Excel with the Steps on separate sheets and a .do file that can be run step by step. If you want to use this, be sure to extract both files to the folder c:\temp on your computer. You can also extract them in a different folder. In that case you need to change the locations of the files in the .do file before running the file in Stata.

Important to remember with this example: Stata may still run into problems if you have hundreds of companies (or more) for which you have daily data for many years. If there are problems I recommend converting the download in brackets of 100 to 200 companies and then appending the resulting databases to create a single Stata database.

N.B.: A student pointed an example of this out to me which he found at: Princeton University Library Data and Statistical Services. I do not claim any credit for the original idea but the Stata example provided on this blog was made by me.

Email

%d bloggers like this: