As we attempt to develop comparable population estimates across space, we must be aware that pulling data from multiple data sets over different periods of time can create some issues with consistency. I discuss some of these issues below.
Common Issues When Comparing Census Population Data Over Time
(1) Changes in Questions Asked- Some questions that are asked in one decade might not be asked in the same way or it might not event be asked at all. Furthermore, the categories, coding scheme, etc., for the answers to the questions might not be the same. These differences create challenges when comparing data over time.
(2) Changes in Tabulations that are Publicly Available- While, in order to maintain confidentiality, individual micro-data is only available to approved researchers for approved projects within a restricted Federal Statistical Research Data Center (FSRDC), for the general public data is available as a series of tabulations (summed totals or averages) by block or block group or tract or whatever level of geography. While in one decade, Census might release a table for a particular variable for one decade (such as number of households in a block group that are Hispanic and in poverty), that table and information is not guaranteed to be available in another decade.
(3) Changes in the Geographic Space included in a Block/Block Group/Tract- The United States Census breaks up the geographic space of the United States in a hierarchy of polygons representing a specific geographic space. Blocks are small geographic areas contained by visible boundaries, such as streets and railroad tracts. Block groups are clusters of blocks representing anywhere from 600-3,000 people (in 2010). Tracts are clusters of block groups representing anywhere from 1,2000 to 8,000 people (in 2010). Counties are clusters of tracts. States are clusters of counties. However, since blocks, block groups and tracts are established based on the number of people within the area, there are significant changes in areas where there are large population changes. For example, say you are looking at an area in rural Arizona in 2000 that is relatively remote. This space might be made up of one tract and a few block groups and blocks. Then you might look at this same area in 2010, but there were a bunch of retirement homes built in the area and it is now a populated suburban area. This space might now be made up of four tracts and numerous block groups and blocks. Since Census blocks, block groups, and tracts do not consistently represent the same geographic space, we face further challenges when comparing community data.
In order to deal with these concerns, I followed the processes described below.
The Processes of Comparing Census Data Over Time
- Studying Technical Documentation to Determine Feasibility of Variable Comparison
- Downloading Bulk Data from Census FTP using Python
- Reading Census Data to Stata
- Preparing Data for Census Crosswalk
- Completing the Crosswalk
- Cleaning the Data
- Assessing the Crosswalk
- Adding CBSA Data
I will describe these steps in more detail, provide the documentation necessary to replicate my research, and describe things I would have done differently in future posts.