This document will outline the data extraction process via the CPS Utilities software and CD-ROM. CPS Utilities is a wonderful and time-saving product that will allow you to simultaneously extract one or more years of Current Population Survey data. The intent of this document is to give you a feeling for the extraction software that is included with these CDs. It is not a complete guide to using the CPS Utilities CD-ROMS and we encourage you to become familiar with the related printed material in the library prior to attempting to using this product.
Before beginning on the computer, you may want to review the CPS Utilities User Manual and other relevant CPS Documentation volumes that can be found at the DPLS (such as original codebooks for each year, the survey design report, etc.). Section III of the CPS Utilities manual, Topical Groupings, lists the variables grouped by topic. Section IV, Dictionary, is a data dictionary that provides further information on each variable. Another section lists the recoded variables created by CPS Utilities to combine similar variables from different years (variables names that start with and underscore ( _ ) are recoded). If you want to make a list of variables manually, use the mnemonic codes found in the CPS Utilities manual. You can also search for variables using the CD-ROM's search engine.
Please note that in the following example screens, you see an option for the Outgoing Rotation Group data. Please ignore this option -- DPLS does not currently have the CPS Utilities Outgoing Rotation Group data (however, we do have these data in another format; see the DPLS library staff for more information if you want to use Outgoing Rotation Group data).
Using the CPS Utilities involves a series of distinct steps:
Before starting the software, use the Windows File Explorer to create a work subdirectory in d:\cpswin, such as d:\cpswin\smith. Note that typically this working directory will be left in place for up to three days.
To start CPS Utilities, do the following:
This will bring up the following screen:
The CPS Utilities data at DPLS is comprised of March files and October Files. When you run Setup you should make sure that you are accessing the correct CD-ROM, using the correct drive, and that your working directory is properly setup.
Check the drive indicated in the CD drive field. It should be E:. To change, click on the Select button next to that field. This will open a dialog box on the right side of the screen. Select the correct drive and click on OK.
The working directory should point to the directory you created. To change the working directory, click on select and edit the entry in the dialog box.
To go back to the main screen, select Save Settings. Note: Please DO NOT CHANGE the Path to your data dictionary files: and DO NOT Reload data dictionary files from CD.
If you would like to exit CPS Utilities completely, you can select the Exit button at any time.
The easy-to-use CPS Utilities Search engine allows you to search the data dictionary for terms and keywords. Select the Search button. Important: Request the relevant series (for March files or October files) on the right-hand side.
Two checkbox options are given. If checked, "Include questionnaire items in report" provides the text of the survey question along with the variable codes. "Limit search to topic description only" allows the search to cover only major topical items instead of all words in all variables.
If you type in a word, string, or phrase in more than one text box at a time, you need to consider how you want the terms combined in the search. "Find variables including all of the strings" uses the Boolean operator AND. This will mean, for the screen shown here, that variables will be shown only if they are about income and insurance. (This search returns 22 hits in the March CD.) "Find variables including any of the strings" uses the operator OR. Variables having anything to do with either insurance or income will appear. (This search returns 451 hits.)
The Results List shows you the variable names (mnemonics) in alphabetic order. By clicking on "Display complete report" you can view the entire coding for every variable in the list, and save it to a file. To see just one variable's coding, highlight the mnemonic and click on "Display selected item details." In this way, you can search, review, and refine the variables you wish to select, and find out to which years the variables apply.
At this time, you may save your selected to variables to a request file (see below). Click on "Save to request file" to get to the next screen.
If you run CPS utilities multiple times, you may have a request file that you saved previously that you are editing. But the first time you run the program you must create a new request file. A request file contains information on your extraction including variables and subsetting criteria, e.g. age > 65 or sex=1 (keep only males). If you did not need to use the Search Engine, go to the Extract screen from the main menu and select the button Create new request file. This will bring up the following screen:
At this screen, type in the name of a new request file. If you have selected variables from the search engine, click on "Add saved items from search." Otherwise, select Display and add item(s) from variables list. You will see the following:
The list of variables for the relevant study will appear. Again, you should highlight the file that matches the CD you selected on the Change Setup screen to get the appropriate list of variables (e.g. March or October). Select the variables you would like to extract. To select multiple variables, hold down the Ctrl key as you make additional selections. To select an entire range use the Shift key.
Once you have selected all the variables you would like, select Add, then Exit. You will then be brought to the following screen:
Return to the Main Screen. To begin an extraction, select the Extract button. This will bring up the following screen.
This screen displays the name of the request file and the list of variables it contains. If you would like to subset your data by adding selection criteria, this is the screen you will want to use. Subsetting your data will make your files much smaller and easier to handle and your extraction will run faster. If you only need a subset of observations it is encouraged that you do it here rather than via a statistical software program later.
To subset by a variable or combination of variables, you should start by highlighting the variable you will use to define the subset. For example, if you wanted records of only males you would do the following:
Once you are set with it, you must first Save the request file and then Exit. You will see the following:
Once you are back at this screen and your request file is highlighted, you need to select the years you would like to do the extraction for, again holding down the Ctrl key to select multiple years. Remember that not all variables occur in all years. The User Manual provides details about this issue. Once you have selected your year or years, you are ready to run your data extraction.
Warning! These extraction files can be very large and can take a very long time to create. For this reason, we recommend that you proceed slowly through your extraction by completing the following steps:
A series of messages will appear on the screen including one which will indicate how big your extraction file will be. The larger it is, the longer it will take to run. Hit Esc to cancel at any time.
If you have done a test run on a limited number of records, it will give you the message that certain files already exist in your directory. If those files are from your test run, you can simply replace them. If they are from another extraction you have done that you still need, you should rename the old files before proceeding.
If you are extracting more than one year of data, separate extraction runs will be performed for each year.
The extract(s) will be written to your work directory and will include the following files:
You can also choose to have a Stata .dta file generated.