Kate Willyard
  • About
  • Research
    • Journal Articles
    • Awarded Research Proposals
    • Datasets
    • Doctoral Dissertation
    • Op-Eds and News
    • Presentations
    • Working Papers
  • Teaching
    • Classical Social Theory
    • Enviornmental Sociology
    • Medical Sociology
    • Social Inequality
    • Social Problems
    • Sociology of Organizations
  • Academic Blog
  • Personal Blog
  • Contact

Academic Blog

"If we knew what we were doing, it would not be called research, would it?" -Albert Einstein

Comparing Census Population Data, Part Five: Preparing Census Data for the Census 2000 to 2010 Geography Crosswalk

10/10/2018

0 Comments

 
This post describes the fourth step of my research to compare Census population data over time: Preparing Census Data for the Census Crosswalk. See Step Three: Reading Census Data into State, Step Two: Downloading Bulk Data from Census FTP using Python Programs, and Step One: Studying Documentation to Determine the Feasibility of Variable Comparison Over Time for a description of the work completed prior to getting to this step. See Comparing Census Population Data, Part One to get introduced to the research project.

I started by importing the Census 2000 to Census 2010 Block Crosswalks into Stata, my preferred statistical program. This information came as a series of .txt files by state (It was downloaded from the Census ftp site during Step Two). Using Stata, I wrote a loop that imported and saved each state file. I decided to keep them as state files in order to minimize memory problems when running the crosswalk. You can find the code I used to manage the crosswalk (06-managecrosswalk_20180801.do) by clicking here.

Keeping within the loop, after I imported the file, I labeled each variable and ensured each unique identifier was in correct format, meaning that state codes are two-digit characters, county codes are three-digit characters, tract codes are six-digits, block group codes are one-digit, and block codes are three-digits. However, I had made a mistake and created block codes as four-digit codes that included the block group number.

Next, I created unique block, block group, tract, and county identifiers for 2000 and 2010. Unique Census block identifiers should be fifteen-digits, but I made mine sixteen-digits, while doubling up on the block group number. In other words, my unique Census block identifier was created using the two-digit state code, followed by the three-digit county code, followed by the six-digit tract code, followed by the one-digit block group code, followed by the one-digit block group code and the three-digit block code. Twelve-digit block group identifiers were created using the two-digit state code, followed by the three-digit county code, followed by the six-digit tract code, followed by the one-digit block group code. Eleven-digit tract identifiers were created using the two-digit state code, followed by the three-digit county code, followed by the six-digit tract code. Finally, five-digit county codes were created using the two-digit state code, followed by the three-digit county code.

After that, I made Census 2000 to Census 2010 block level geographic weights by dividing the area of land that intersected between the two block areas divided by the area of the block in 2000. Then I made Census 2010 to Census 2000 block level weights by dividing the area of land that intersected between the two block areas divided by the area of the block in 2010. Then, for each block, I counted the number of times it was in the crosswalk in 200 and in 2010. Using that information, I identified the block 2000 block 2010 relationship type, meaning, was there no relationship, was there a one to one relationship, a many to one relationship, a one to many relationship or a many to many relationship. Finally, I saved the file for each state and then created a complete, national crosswalk.

I replicated this process to create weights at the block group, tract, and county level. However, when reading the code, you will notice a unique difference. In order to create the crosswalks for geographies larger than the block, I first created a dataset of information that was constant across time. Next I created a dataset of information for 2000, and a dataset of information for 2010. In these I just keep the unique Census identifier for the level of analysis , the area of the block, and the are of land that intersected between the two block areas, and then deleted duplicates. Then, by each unique Census identifier, I summed up the areas to create correct areas by the unit of analysis, and connected these three datasets (constant, Census 2000 and Census 2010) by the unique Census Identifier before creating the geographic weights.

Finally, I prepped the Census data that was extracted in Step Three. You can find the code I used to finish preparing Census data for the crosswalk (07-prepcensusdata_20180807.do) by clicking here. In short, for each geographic level (county, tract, block group, and block), and each dataset (Census 2000 Summary File 1, Census 2000, Summary File 3, Census 2010 Summary File 1, and American Community Survey 2008-2012), I kept the variables I plan to use and labeled them accordingly. Then, I made files for each geographic level for the 2000 Census Data by merging the Census 2000 Summary File 1 and Summary File 3 datasets by the unique Census identifier, and created state-level and national-level datasets. State-level datasets were saved using their unique Census state FIPS code. I finished by making files for each geographic level for the 2010 Census Data by merging the Census 2010 Summary File 1 and American Community Survey 2008-2012 datasets by the unique Census identifier, and creating state-level and national-level datasets.
​
0 Comments

    Author

    Kate Willyard is a political and economic sociologist interested in human organization and the environment.

    Archives

    October 2018
    July 2018
    June 2018
    November 2017
    March 2017
    January 2017
    December 2016
    May 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015

    Categories

    All
    Critical Geography
    Economic Sociology
    Environmental Sociology
    Natural Resources
    Political Sociology
    Quantitative Research Methods
    Sociology Of Organizations

    RSS Feed

Research Gate

ORCID

Academia

LinkedIn

GitHub