Kate Willyard
  • About
  • Research
    • Journal Articles
    • Awarded Research Proposals
    • Datasets
    • Doctoral Dissertation
    • Op-Eds and News
    • Presentations
    • Working Papers
  • Teaching
    • Classical Social Theory
    • Enviornmental Sociology
    • Medical Sociology
    • Social Inequality
    • Social Problems
    • Sociology of Organizations
  • Academic Blog
  • Personal Blog
  • Contact

Academic Blog

"If we knew what we were doing, it would not be called research, would it?" -Albert Einstein

Comparing Census Population Data, Part Four: Reading Census Data into Stata

7/11/2018

0 Comments

 
This post describes the third step of my research to compare Census population data over time: Reading Census Data into Stata. See Step Two: Downloading Bulk Data from Census FTP using Python Programs and Step One: Studying Documentation to Determine the Feasibility of Variable Comparison Over Time, Comparing Census Population Data, Part Two for a description of the work completed prior to getting to this step. See Comparing Census Population Data, Part One to get introduced to my research project.

The data downloaded from Census came as a series of .xls, .accdb, .dbf, .txt, .csv files. In order to get this data in a manageable format, I used Stata, my preferred statistical/mathematical software program. I wrote Stata code to do this for each dataset.

First, I wrote a Stata code to read in the American Community Survey (ACS) data. Census provides several .csv files that include variable labels and descriptions for each of the segments and the geography files. This takes several steps reading the geography files for each state and using the .csv template to label data, reading the segment .txt file for each state and using the template to label the data, linking segment files to geography files using the LOGRECNO variable, creating national county, block group, and tract segment files, and linking segments to create complete county, block group, and tract estimates. In the code, the user needs to set/revise the path where they are storing log files and revised data. This Stata code (02-manageACS12_20180731.do) is available by clicking here. 

Next, I wrote a Stata code to read in the Decennial Census 2010 Summary File One data. Census does not provide variable labels and descriptions for each of the segments and the geography files. Instead, it provides an Access 1999 file with templates for each of the segments. These templates are accessible in .csv format by clicking here. For the Stata code to work, these templates must be saved in the same location as the original data downloaded in the previous step. While it also has a template for the geography file, the template does not work because the state geography files do not have common separators such as commas. As a result, I had to write dictionaries for each state's geography file. For the Stata code to work, the dictionary files must be saved in the same location as the original data downloaded in the previous step. These dictionaries are accessible by clicking here. I then wrote the Stata code that goes through several steps reading the geography files for each state, reading the segment text file for each state and using the template to label the data, linking segment files to geography files, creating national block group, tract, and county segment files, and linking segments to create complete block, block group, tract, and county estimates. In the code, the user needs to set/revise the path where they are storing log files and revised data. This Stata code (03-manageDC00SF1_20180720.do) is available by clicking here.

Then, I wrote a Stata code to read in the Decennial Census 2000 Summary File One data. Census does not provide variable labels and descriptions for each of the segments and the geography files. Instead, it provides an Access 2007 file with templates for each of the segments. These templates are accessible by clicking here. For the Stata code to work, these templates must be saved in the same location as the original data downloaded in the previous step. While it also has a template for the geography file, the template does not work because the state geography files do not have common separators such as commas. As a result, I had to write dictionaries for each state's geography file. For the Stata code to work, the dictionary files must be saved in the same location as the original data downloaded in the previous step. These dictionaries are accessible by clicking here. I then wrote the Stata code that goes through several steps reading the geography files for each state, reading the segment text file for each state and using the template to label the data, linking segment files to geography files, creating national block group and tract segment files, and linking segments to create complete block, block group, and tract estimates. In the code, the user needs to set/revise the path where they are storing log files and revised data. This Stata code (04-manageDC00SF1_20180720.do) is available by clicking here.

Finally, I went through similar processes as described above for the Decennial Census 2000 Summary File Three data. Click here for the templates for each of the segments. Click here for the geography dictionary files. Click here to access the Stata code (05-manageDC00SF3_20180723.do).
0 Comments

    Author

    Kate Willyard is a political and economic sociologist interested in human organization and the environment.

    Archives

    October 2018
    July 2018
    June 2018
    November 2017
    March 2017
    January 2017
    December 2016
    May 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015

    Categories

    All
    Critical Geography
    Economic Sociology
    Environmental Sociology
    Natural Resources
    Political Sociology
    Quantitative Research Methods
    Sociology Of Organizations

    RSS Feed

Research Gate

ORCID

Academia

LinkedIn

GitHub