I started by importing the Census 2000 to Census 2010 Block Crosswalks into Stata, my preferred statistical program. This information came as a series of .txt files by state (It was downloaded from the Census ftp site during Step Two). Using Stata, I wrote a loop that imported and saved each state file. I decided to keep them as state files in order to minimize memory problems when running the crosswalk. You can find the code I used to manage the crosswalk (06-managecrosswalk_20180801.do) by clicking here.
Keeping within the loop, after I imported the file, I labeled each variable and ensured each unique identifier was in correct format, meaning that state codes are two-digit characters, county codes are three-digit characters, tract codes are six-digits, block group codes are one-digit, and block codes are three-digits. However, I had made a mistake and created block codes as four-digit codes that included the block group number.
Next, I created unique block, block group, tract, and county identifiers for 2000 and 2010. Unique Census block identifiers should be fifteen-digits, but I made mine sixteen-digits, while doubling up on the block group number. In other words, my unique Census block identifier was created using the two-digit state code, followed by the three-digit county code, followed by the six-digit tract code, followed by the one-digit block group code, followed by the one-digit block group code and the three-digit block code. Twelve-digit block group identifiers were created using the two-digit state code, followed by the three-digit county code, followed by the six-digit tract code, followed by the one-digit block group code. Eleven-digit tract identifiers were created using the two-digit state code, followed by the three-digit county code, followed by the six-digit tract code. Finally, five-digit county codes were created using the two-digit state code, followed by the three-digit county code.
After that, I made Census 2000 to Census 2010 block level geographic weights by dividing the area of land that intersected between the two block areas divided by the area of the block in 2000. Then I made Census 2010 to Census 2000 block level weights by dividing the area of land that intersected between the two block areas divided by the area of the block in 2010. Then, for each block, I counted the number of times it was in the crosswalk in 200 and in 2010. Using that information, I identified the block 2000 block 2010 relationship type, meaning, was there no relationship, was there a one to one relationship, a many to one relationship, a one to many relationship or a many to many relationship. Finally, I saved the file for each state and then created a complete, national crosswalk.
I replicated this process to create weights at the block group, tract, and county level. However, when reading the code, you will notice a unique difference. In order to create the crosswalks for geographies larger than the block, I first created a dataset of information that was constant across time. Next I created a dataset of information for 2000, and a dataset of information for 2010. In these I just keep the unique Census identifier for the level of analysis , the area of the block, and the are of land that intersected between the two block areas, and then deleted duplicates. Then, by each unique Census identifier, I summed up the areas to create correct areas by the unit of analysis, and connected these three datasets (constant, Census 2000 and Census 2010) by the unique Census Identifier before creating the geographic weights.
Finally, I prepped the Census data that was extracted in Step Three. You can find the code I used to finish preparing Census data for the crosswalk (07-prepcensusdata_20180807.do) by clicking here. In short, for each geographic level (county, tract, block group, and block), and each dataset (Census 2000 Summary File 1, Census 2000, Summary File 3, Census 2010 Summary File 1, and American Community Survey 2008-2012), I kept the variables I plan to use and labeled them accordingly. Then, I made files for each geographic level for the 2000 Census Data by merging the Census 2000 Summary File 1 and Summary File 3 datasets by the unique Census identifier, and created state-level and national-level datasets. State-level datasets were saved using their unique Census state FIPS code. I finished by making files for each geographic level for the 2010 Census Data by merging the Census 2010 Summary File 1 and American Community Survey 2008-2012 datasets by the unique Census identifier, and creating state-level and national-level datasets.