Kate Willyard
  • About
  • Research
    • Journal Articles
    • Awarded Research Proposals
    • Datasets
    • Doctoral Dissertation
    • Op-Eds and News
    • Presentations
    • Working Papers
  • Teaching
    • Classical Social Theory
    • Enviornmental Sociology
    • Medical Sociology
    • Social Inequality
    • Social Problems
    • Sociology of Organizations
  • Academic Blog
  • Personal Blog
  • Contact

Academic Blog

"If we knew what we were doing, it would not be called research, would it?" -Albert Einstein

Comparing Census Population Data, Part One: Project Introduction

6/4/2018

1 Comment

 
I am currently working on a project that is comparing Census population data from 2000 to 2010 for the entire United States at the lowest geographic level available, whether it be the Census block, the Census block group, or the Census tract (listed smallest to largest). We also want to compare Census population data from 2000 and 2010 and the County and the Core Based Statistical Area (CBSA), which are groups of counties encompassing a metropolitan area. We are using variables, such as the race of individuals and single mother households, that can be obtained using the 2000 and 2010 Decennial Census short form, which is required for 100% of the population and is available at the block-level. But we are also using variables, such as household income and poverty levels, which use the 2000 Decennial Census long form (which was about a 17% sample of the total population), and the  2008-2012 American Community Survey (which was about a 12.5% sample of the total population) and both the 2000 Decennial Census long form and the 2008-2012 American Community Survey tabulations are only available at the level of the block group or tract.

​As we attempt to develop comparable population estimates across space, we must be aware that pulling data from multiple data sets over different periods of time can create some issues with consistency. I discuss some of these issues below.

Common Issues When Comparing Census Population Data Over Time

There are three common concerns when comparing Census population data at a low level of geography:

(1) Changes in Questions Asked- Some questions that are asked in one decade might not be asked in the same way or it might not event be asked at all. Furthermore, the categories, coding scheme, etc., for the answers to the questions might not be the same.  These differences create challenges when comparing data over time. 

(2) Changes in Tabulations that are Publicly Available- While, in order to maintain confidentiality, individual micro-data is only available to approved researchers for approved projects within a restricted Federal Statistical Research Data Center (FSRDC), for the general public data is available as a series of tabulations (summed totals or averages) by block or block group or tract or whatever level of geography. While in one decade, Census might release a table for a particular variable for one decade (such as number of households in a block group that are Hispanic and in poverty), that table and information is not guaranteed to be available in another decade. 

(3) Changes in the Geographic Space included in a Block/Block Group/Tract- The United States Census breaks up the geographic space of the United States in a hierarchy of polygons representing a specific geographic space. Blocks are small geographic areas contained by visible boundaries, such as streets and railroad tracts. Block groups are clusters of blocks representing anywhere from 600-3,000 people (in 2010). Tracts are clusters of block groups representing anywhere from 1,2000 to 8,000 people (in 2010). Counties are clusters of tracts. States are clusters of counties. However, since blocks, block groups and tracts are established based on the number of people within the area, there are significant changes in areas where there are large population changes. For example, say you are looking at an area in rural Arizona in 2000 that is relatively remote. This space might be made up of one tract and a few block groups and blocks. Then you might look at this same area in 2010, but there were a bunch of retirement homes built in the area and it is now a populated suburban area. This space might now be made up of four tracts and numerous block groups and blocks. Since Census blocks, block groups, and tracts do not consistently represent the same geographic space, we face further challenges when comparing community data.

In order to deal with these concerns, I followed the processes described below.

The Processes of Comparing Census Data Over Time

In order to compare Census data over time, several different steps were taken.

  1. Studying Technical Documentation to Determine Feasibility of Variable Comparison
  2. Downloading Bulk Data from Census FTP using Python
  3. Reading Census Data to Stata
  4. Preparing Data for Census Crosswalk
  5. Completing the Crosswalk
  6. Cleaning the Data
  7. Assessing the Crosswalk
  8. Adding CBSA Data

​I will describe these steps in more detail, provide the documentation necessary to replicate my research, and describe things I would have done differently in future posts.
1 Comment
Gail H link
2/11/2021 06:23:19 pm

I really enjoyed your blog posts thank you.

Reply



Leave a Reply.

    Author

    Kate Willyard is a political and economic sociologist interested in human organization and the environment.

    Archives

    October 2018
    July 2018
    June 2018
    November 2017
    March 2017
    January 2017
    December 2016
    May 2016
    February 2016
    January 2016
    December 2015
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015

    Categories

    All
    Critical Geography
    Economic Sociology
    Environmental Sociology
    Natural Resources
    Political Sociology
    Quantitative Research Methods
    Sociology Of Organizations

    RSS Feed

Research Gate

ORCID

Academia

LinkedIn

GitHub