Using plain text files in Excel

If you use databases to collect data on companies or on anything else, the data files you can save as output are not always nice Excel or SPSS files with all the data in tact. One of the reasons can be the database itself (or the interface): it may change the data to fit the destination (program). Some of the data can, therefore, be reformatted. Excel also sometimes formats data that it presents on your screen.
It is usually a good idea to (if possible) also save the data as a plain old text file. Most of these files can be opened just fine using the Excel Text wizard. If the data is not separated with characters you may need to tell Excel where the columns start. That will only work for text files where the data columns are saved with a fixth width (= fixed number of characters between each column).

1) Most often, the data columns do have characters (or tabs) that separate them. The Text Wizard in Excel allows you to indicate what type of delimiter is needed. When you open a Text file with delimiters in Excel it may look as follows:

2) In the above example you can already see that the caret character ^ is probably what separates the columns of data. When you click the button Next the screen will show some standard Delimiters. The standard option that occurs a lot is the Tab-delimiter. In this case we can use the Other option to type in the caret character ^. Immediately when you do this the Data Preview will show what happens: if all the columns are nicely separated lines will be shown to indicate how Excel thinks the data should be separated into the columns:

3) As a final step you can tell Excel how to open the data in each column. If you leave everything on the Default option General Excel will try to figure out how everything should be presented. If you want to skip columns this is also possible. In my view it is always a good idea to select all the columns and open everything initially as Text. This way Excel will not use any formating whatsoever and all the data is presented exactly as you got it from the database. In any case, it is always a good idea to select the Text option for those columns that list company identifiers. If you do not, Excel may change these and make it impossible to use them later for additional searches or to combine the data later with another dataset. Example screen:

After selecting the import options for all the columns you can click the Finish button to import the text file. The result should look similar to this:

Not all text files have one (combination of) character(s) that separetes the text in columns. The example file I used from SDC Platinum also has a column where the data is separated using the tilde character: ~. See below:

It is also possible to use the Excel Text Wizard on a Text file after opening it:

  • The option is available in Excel using the tab at the top (in the menu) called Data (or: Gegevens). When you have done this the option that are presented in Excel change:
  • Next select the column for which you want to separate the data
  • Now select the option Text to Columns in the Excel menu. The text Wizrda should pop up:
  • In the example we see that the text has the delimiter character, called the Tilde: ~. If necessary, mark the option Delimited and click the button Next
  • A new screen appears where you can indicate what delimiter is needed. Use the Other option to indicate/type the Tilde character. Click the Next button to continue.
  • You can now also choose to got to the Next screen and determine how Excel will format the data. In this example I click the finish button. A warning may appear when there are already columns to the right. This warning says: “Do you want to replace the Contents of the destination cells?“. When I Click the yes button the result looks as follows:

NB: changing data with the Text Wizard in Excel can be dangerous. The original data will be changed and the result will not always be good. Some data may be lost. It is always a good idea to use a copy of the original file and use this before starting to rework the data.

Email

Combining data: Another Vlookup example

In the past I created a Vlookup example to show it is possible to use Microsoft Excel to combine data from datasets. The example was a small example based on CRSP and Compustat. I sometimes get questions on how to use Excel to do this and that is why I thought another example using other databases may prove useful. This time I use output data from SDC Platinum (sometoimes known as ThomsonOne) and combined it with data from Datastream.

Before you use Excel to combine data it is important to remember that (in my experience) Excel and Excel worksheets usually work from left to right. That means that very often searches will go through worksheets and within worksheets in rows fromk left to right while searching. The original source sheets where I want to extract data from, I usually put to the left of the destination sheet where I want to combine the data. This is usually my final worksheet on the right.

Example:

1) Download from SDC Platinum’s database Global New Issue
This database is also available from Thomson Reuters as the Deals database in Thomson One. My starting point for the exercise is a download of IPO data for companies that issued ordinary or common shares at an American exchange. I made sure that I included identification codes like ISIN in the download. Using these ISIN codes I will look up additional data and combine it later (other codes may also be used). As a first step I needed to filter the SDC output to make sure that I have only IPO’s with ISIN codes. The filter option in Excel is available in the first Tab (Home) on the right side and in the tab Data (= Gegevens).

  • Select the first row and click the Filter option.
  • Click the Filter icon and (in the ISIN column) unmark the option: Blanks (= empty cells / lege cellen).

2) Copy the filtered data

  • Use excel to copy the filtered result to another Excel sheet. We will use this sheet to add Datastream data to at a later stage. This will be the destination worksheet.
  • Make sure that the ISIN codes are copied and pasted as the first Column in this sheet: only 1 column of ISIN codes should remain!
  • Now copy the the column ISIN codes to a new sheet (If necessary remove duplicate ISIN codes using this option in the Excel Data tab [= Gegevens]).
  • Use the Datastream Add-in to upload the ISIN codes. Click the Datastream option called: Create list (/from range) and follow the steps.

3) Using the list we can now find out what the first trading days are for each stock.

  • Go to the Datastream tab/Add-in
  • Click the option Static search
  • Using the List picker button (below the orange button “Find series”) to select the uploaded list
  • Use the Datatype Base date (BDATE) and click the submit button to download these dates.

4) For the example we also need price data from Datastream
In this example we want the price data for the first 5 trading days, and specifically the original historical unaltered closing price: UP.

  • To download the data using the Request Table option in Datastream we first need to calculate the dates 5 days after each Base date. Do this by adding 5: like you would with regular numbers:
  • Copy the result in the column next to it. Use the option Past Special > values. This way you lose the function but keep the calculated dates.
  • Now use the ISIN codes as well as the column Base Dates and Calculated end dates to create a Request table search in Datastream.Make sure to have the original ISIN codes returned as well in the output! Another blog post explains exactly how to do a Request Table search. The output from Datastream should look as follows:

5) Remove unnecessary rows in Excel
For the matching data we first need to do the following:

  • Use the Excel Text filter option to kep only the price data and the ISIN codes. See example 8 and 9.
  • Now insert a column to the right of the column with the ISIN codes AND datatype code UP. Use the Left() (= links() ) function to extract the 12 digit ISIN codes.
  • Copy the result to a new sheet. Use Past Special > Values to keep the ISIN codes and remove the Excel functions.

6) Matching data using Vlookup
Important: make sure that both the source/data-sheets where you get data from are in the same Excel worksheet but to the left of the worksheet in which you want to combine the data (destinatin worksheet).

  • Go to the destination sheet with the IPO data from step 2 and click in the cell to the right of the last column with data
  • Use the Excel function search option (fx button) to find the Vlookup function (= Verticaal.zoeken):
  • The Vlookup popup has 4 arguments that need to be used:
  • Argument/box 1: indicate the value that needs to be looked up in a data/source worksheet: in this example the isin code of the first row:
  • Argument/box 2: indicate the full table where it needs to search for this specific value (= ISIN code). This includes the first column of ISIN codes and the second column with the first trading date = Base Date.
  • Argument/box 3: type in the number of the column where the data comes from that you want to see in the destination sheet (IPO data from SDC) from the source sheet (= output from static Datastream search). In this case column 2 (= B).
  • Argument/box 4: indicate with the text TRUE or FALSE whether the code should be an exact or approximate match. In this case we need FALSE as we need an exact match.
  • Click OK to continue. The result for the first row: it shows the base date from the original result sheet with the output of the Static search in Datastream.Copy this Vlookup function downward for the other rows.
  • In the column next to it (right side) we can add the first price from another worksheet (created at step 5) using a similar Vlookup option:

Email

Datastream Navigator changed

Today I noticed that the Datastream Navigator changed: when you look for Datatypes (= variables) for series/selections the Datastream Navigator has been changed to look more like the sceen you get when you are selecting series. You now have filtering options just like when selecting series.

You still need to be carefull when you start the Datatype search as the default data category at the top left corner is still Equities. Depending on the variables you need the Data Category may need to be changed before you start looking. Otherwise, the search result will only retrieve variables that are relevant for the Data Category shown here.

A full overview of the changes is available for download here.

There is also a new Quick Reference card from Thomson Reuters.

Email