As we’ve noted in our basic overview of ZIP Codes, they are identical to the U.S. Census Bureau’s ZIP Code Tabulation Areas or other geographies. We therefore use crosswalk files to convert ZIP Codes to these other identifiers.
zippeR provides an interface for accessing the former
UDS Mapper project’s ZIP to ZCTA crosswalk files](http://web.archive.org/web/20231218141557/https://udsmapper.org/zip-code-to-zcta-crosswalk/).
Crosswalk files are critical because not all ZIP codes are in the exact
same ZCTA. The UDS files are available from 2010 through 2021 in a
standardized format:
> zi_load_crosswalk(year = 2020)
# A tibble: 41,096 × 6
ZIP PO_NAME STATE ZIP_TYPE ZCTA zip_join_type
<chr> <chr> <chr> <chr> <chr> <chr>
1 00501 Holtsville NY Post Office or large volume customer 11742 Spatial join to ZCTA
2 00544 Holtsville NY Post Office or large volume customer 11742 Spatial join to ZCTA
3 00601 Adjuntas PR ZIP Code Area 00601 ZIP Matches ZCTA
4 00602 Aguada PR ZIP Code Area 00602 ZIP Matches ZCTA
5 00603 Aguadilla PR ZIP Code Area 00603 ZIP Matches ZCTA
6 00604 Aguadilla PR Post Office or large volume customer 00603 Spatial join to ZCTA
7 00605 Aguadilla PR Post Office or large volume customer 00603 Spatial join to ZCTA
8 00606 Maricao PR ZIP Code Area 00606 ZIP Matches ZCTA
9 00610 Anasco PR ZIP Code Area 00610 ZIP Matches ZCTA
10 00611 Angeles PR Post Office or large volume customer 00641 Spatial join to ZCTA
# … with 41,086 more rowsAs with the three-digit ZCTA geometry, users should evaluate these data carefully before using them to ensure they are fit for purpose. In particular, they should note that ZIPs that do not have corresponding ZCTAs (such as Armed Forces mailing ZIPs and those in some overseas territories) are not included. Users should also remember that individuals may live in a different ZCTA from their mailing address when that address is a Post Office or some other large volume customer.
They can be used with zi_crosswalk() to convert given
ZIP codes to ZCTAs:
> zips <- data.frame(id = c(1:3), ZIP = c("63139", "63108", "00501"))
> zi_crosswalk(zips, input_var = ZIP, zip_source = "UDS", year = 2021)
# A tibble: 3 × 3
id ZIP source_zcta
<int> <chr> <chr>
1 1 63139 63139
2 2 63108 63108
3 3 00501 11742If "UDS" is given for zip_source with a
year between 2009 and 2022, zi_crosswalk()
will automatically download the corresponding UDS crosswalk file. A
custom crosswalk can also be supplied as a data frame for
zip_source in lieu of using the UDS data, including a
crosswalk created from zi_load_crosswalk() using HUD data.
In that case, source_var and source_result
should be specified to correctly match input variable names.
Note: The legacy parameter names
input_zipanddictare still accepted for backwards compatibility but are deprecated in favor ofinput_varandzip_source/year.
The U.S. Housing and Urban Development (HUD) Department provides ZIP
code to Census geography crosswalks that can be used to convert ZIP
codes to Census Tracts, counties, and other geographies. These data are
available through the HUD User website
(https://www.huduser.gov/portal/datasets/usps_crosswalk.html).
Unlike the UDS files, ZIP Code Tabulation Areas are not one of the
geographies including. If HUD data are used, be aware of ZIP Codes
mapping into multiple Census Tracts, counties, etc. Many users may want
to pick a “most likely” county (or other Census geometry) based on the
proportion of commercial or residential customers.
To use the HUD data, users must first obtain an API key from the HUD
User website
(https://www.huduser.gov/portal/dataset/uspszip-api.html).
Once you have an API key, they can use zi_load_crosswalk()
to download the data either by passing the key directly to the function
or by storing the key in their .Rprofile
under the object name hud_key:
The key can also be passed to zi_load_crosswalk directly
with the key argument:
> zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1, target = "COUNTY",
+ query = c("63138", "63139"))
# A tibble: 3 × 8
ZIP GEOID RES_RATIO BUS_RATIO OTH_RATIO TOT_RATIO CITY STATE
<chr> <chr> <dbl> <dbl> <int> <dbl> <chr> <chr>
1 63138 29189 0.999 0.988 1 0.999 SAINT LOUIS MO
2 63138 29510 0.000518 0.0124 0 0.000956 SAINT LOUIS MO
3 63139 29510 1 1 1 1 SAINT LOUIS MO Queries can be either a single ZIP Code, a vector of ZIP Codes, a
state abbreviation, or the word "ALL" to download the
entire crosswalk file. Using states or "ALL" is available
from the 1st quarter of 2021 onwards. The target argument
can be set to “COUNTY”, “TRACT”, “CBSA”, “CBSADIV”, “CD”, or
“COUNTYSUB”. The year and qtr arguments
specify the year and quarter of the data to download.
Note that the above query finds that the ZIP Code 63138
straddles two counties, but the vast majority of both residential and
commercial customers are in St. Louis City (GEOID is
29510). If you were building a crosswalk file from these,
you might want to select St. Louis City as the “most likely” county for
ZIP Code 63138. The
Since using the HUD data requires a number of analytic choices, it
cannot be accessed directly through zi_crosswalk().
Instead, you should construct the desired crosswalk file yourself and
then pass it to zi_crosswalk() as a custom dictionary. The
zi_prep_hud() function can help you prepare the HUD data
for use in joins:
# access to HUD ZIP Code to County crosswalk for all ZIP Codes in Missouri
mo <- zi_load_crosswalk(zip_source = "HUD", year = 2023, qtr = 1,
target = "COUNTY", query = "MO")
# prep data
mo <- zi_prep_hud(mo, by = "residential")The resulting output contains one row of data for each ZIP Code matched with the county that has the highest proportion of residential ZIP Codes. Users can also construct a crosswalk using commercial addresses or total addresses. When used with multiple states, if the ZIP Code straddles two states, two records will be returned.