mirror of
https://github.com/DOI-DO/j40-cejst-2.git
synced 2025-02-22 09:41:26 -08:00
Documentation for Updating Datasets
This commit is contained in:
parent
763572da12
commit
0fab6f6868
5 changed files with 2109 additions and 36 deletions
|
@ -2,3 +2,5 @@ dirs:
|
|||
- .
|
||||
ignorePatterns:
|
||||
- pattern: '^http://localhost.*$'
|
||||
excludedFiles:
|
||||
- ./DATASETS.md
|
70
DATASETS.md
70
DATASETS.md
|
@ -1,38 +1,38 @@
|
|||
# Justice40 Datasets
|
||||
|
||||
Below is a table of all datasets that feed the CEJST application, including access links and contacts.
|
||||
Below is a table of all datasets that feed the CEJST application, including access links, contacts, and update information.
|
||||
|
||||
| **Indicator Group** | **Indicator** | **Description** | **Notes** | **Publisher** | **Year(s)** | **Source** | **Geography** | **Geographies available** | **Can be updated to 2020 Census Tracts?** | **Contact** | **Current Data Download** |
|
||||
| ------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------- | ---------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------ | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Climate Change** | Expected Agriculture Loss Rate | Economic loss to agricultural value resulting from natural hazards each year | | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Data to be released End of March 2023 | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Expected Building Loss Rate | Economic loss to building value resulting from natural hazards each year | | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Data to be released End of March 2023 | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Expected Population Loss Rate | fatalities and injuries resulting from natural hazards each year | this burden only applies for census tracts with populations greater than 20 people. | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Data to be released End of March 2023 | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Projected Flood Risk | projected risk to properties from projected floods, from tides, raid, riverine and storm surges within 30 years | these were emailed to J40 initially | First Street Foundation | projecting 2022-2052. Released in 2020 | | 50 states, DC, PR | 2010 Census Tract (we think, but documentation does not say) | Updated is available | Ed Kearns, Chief Data Officer of First Street Foundation. (ed@firststreet.org) | https://aws.amazon.com/marketplace/pp/prodview-r36lzzzjacd32?sr=0-1&ref_=beagle&applicationId=AWSMPContessa#overview |
|
||||
| **Climate Change** | Projected Wildfire Risk | projected risk to properties form wildfire from fire fuels, weather, humans, and fire movement in 30 years | these were emailed to J40 initially | First Street Foundation | projecting 2022-2052. Released in 2020 | | 50 states, DC, PR | 2010 Census Tract (we think, but documentation does not say) | Updated is available | Ed Kearns, Chief Data Officer of First Street Foundation. (ed@firststreet.org) | https://aws.amazon.com/marketplace/pp/prodview-r36lzzzjacd32?sr=0-1&ref_=beagle&applicationId=AWSMPContessa#overview |
|
||||
| **Energy** | Energy Cost | Average annual energy costs divided by household income | | DOE | 2018 | LEAD Tool | 50 states, DC, PR | Census 2010 | Yes, in March 2023 | Aaron Vimont, developer at National Renewable Energy Laboratory. (aaron.vimont@nrel.gov); Toy Reames (tony.reames@hq.doe.gov) | "https://data.openei.org/submissions/573 https://www.energy.gov/scep/slsc/lead-tool |
|
||||
| **Energy** | PM2.5 in the air | level of inhalable particles, 2.5 micrometers or smaller | | Environmental Protection Agency (EPA) Office of Air and Radiation (OAR) | 2017 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Health** | Asthma | Share of people who have been told they have asthma | New Data Source: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Not until December 2024 | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti |
|
||||
| **Health** | Diabetes | Share of people ages 18+ who have diabetes other than diabetes during pregnancy | New Data Source: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Not until December 2024 | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti |
|
||||
| **Health** | Heart Disease | Share of people ages 18+ who have been told they have heart disease | New Data Source: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Not until December 2024 | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti |
|
||||
| **Health** | Low life expectancy | Average number of years a person can expect to live | | CDC | 2010-2015 | US Small Area Life Expectancy Estimates Project | 50 States + DC | Census 2010 | 2025 | Elizabeth Arias (efa3@cdc.gov) | https://www.cdc.gov/nchs/nvss/usaleep/usaleep.html#life-expectancy |
|
||||
| **Housing** | Historic Underinvestment | Census tracts that experienced historic underinvestment based on redlining maps created by the federal government’s Home Owners’ Loan Corporation (HOLC) between 1935 and 1940. | | National Community Reinvestment Coalition (NCRC) | | Home Owners Loan Corporation | 50 States + DC | Census 2010 & 2020 | Yes | | https://www.openicpsr.org/openicpsr/project/141121/version/V2/view |
|
||||
| **Housing** | Housing Cost | Share of households making less than 80% of the AMI and spending more than 30% of income on housing | maybe could be found in ACS | Department of Housing and Urban Development (HUD) | 2014-2018 | Comprehensive housing affordability strategy dataset | 50 States + DC+ PR | Census 2010 | Early Summer 2023 | Blair Russell, Office of Policy Development and Research; HUD (Blair.D.Russell@hud.gov) | https://www.huduser.gov/portal/datasets/cp.html#2006-2019_data |
|
||||
| **Housing** | Lack of Green Space | Amount of land, not including crop land, that is covered with artificial materials like concrete or pavement | | Multi-Resolution Land Characteristics Consortium | 2019 | National Land Cover Database (USGS) | 48 States + DC | Possibly not bound to geographies because its raster data. TPL imputed to census 2010 for us, I think. | Maybe? Use same data but pre process to Census 2020 | | Was provided by the trust for public land but you can also get it here as image data https://www.sciencebase.gov/catalog/item/5f21cef582cef313ed940043 |
|
||||
| **Housing** | Lack of Indoor Plumbing | Share of homes without indoor kitchens or plumbing | maybe could be found in ACS | Department of Housing and Urban Development (HUD) | 2014-2018 | Comprehensive housing affordability strategy dataset | 50 States + DC + PR | Census 2010 | Early Summer 2023 | Blair Russell, Office of Policy Development and Research; HUD (Blair.D.Russell@hud.gov) | https://www.huduser.gov/portal/datasets/cp.html#2006-2019_data |
|
||||
| **Housing** | Lead paint | Share of homes that are likely to have lead paint | Share of homes built before 1960, which indicates potential lead paint exposure. Tracts with extremely high home values (i.e. median home values above the 90th percentile) that are less likely to face health risks from lead paint exposure are not included. | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Legacy Pollution** | Abandoned Mine Land | Presence of one or more abandoned min land within the tract | | Department of the Interior, Office of Surface Mining Reclamation and Enforcement | 2017 | Abandoned Mine Land Inventory System | 50 States + DC | Point Data | Yes, points can be mapped to any geography | | https://www.osmre.gov/programs/e-amlis |
|
||||
| **Legacy Pollution** | Formerly used Defense Site | Presence of one or more formerly used defense site within the tract | | US Army Corps of Engineers | 2019 | Formerly Used Defense Sites | 50 States + DC | Point Data | Yes, points can be mapped to any geography | | https://www.usace.army.mil/Missions/Environmental/Formerly-Used-Defense-Sites/ |
|
||||
| **Legacy Pollution** | Proximity to Hazardous Waste Facilities | count of hazardous waste facilities within 5 km | | EPA | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Legacy Pollution** | Proximity to Risk Management Plan Facilities | count of risk management plan facilities within 5 kilometers | | EPA | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Legacy Pollution** | Proximity to Superfund Sites | count of proposed or listed superfund or national priorities list sites within 5 km | | EPA | 2021 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Transportation** | Diesel particulate matter exposure | amount of diesel exhaust in the air | | EPA | 2014 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Transportation** | transportation barriers | average of relative cost and time spent on transportation | | DOT | 2022 | Transportation Access Disadvantage | 50 States + DC | Census 2020 | Yes | | https://www.transportation.gov/equity-Justice40#:~:text=Transportation%20access%20disadvantage%20identifies%20communities%20and%20places%20that%20spend%20more%2C%20and%20take%20longer%2C%20to%20get%20where%20they%20need%20to%20go.%20(4) |
|
||||
| **Transportation** | traffic proximity and volume | count of vehicles at major roads within 500 meters | | DOT (via EPA) | 2017 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Water and Wastewater** | underground storage tanks and releases | formula of the density of leaking underground storage tanks and number of all active underground storage tanks within 1500 feet of the census tract boundaries | | EPA /UST Finder | 2021 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Water and Wastewater** | wastewater discharge | modeled toxic concentrations at parts of streams within 500 meters | | EPA Risk Screening Environmental Indicators | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Workforce Development** | Linguistic isolation | Share of households where no one over age 14 speaks English very well | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | low median income | comparison of median income in the tract to the median incomes in the area | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | poverty | share of people in households where income is at or below 100% of the federal poverty level | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | Unemployment | number of unemployed people as a part of the labor force | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | High school Education | Percent of people ages 25 or older whose high school education is less than a high school diploma | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Multiple Factors** | Low income | People in households where income is less than or equal to twice the federal poverty level, not including students enrolled in higher ed | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Indicator Group** | **Indicator** | **Description** | **Notes** | **Publisher** | **Year(s)** | **Source** | **Geography** | **Geographies available** | **Can be updated to 2020 Census Tracts?** | **How to update** | **Contact** | **Current Data Download** | **Updated Data Download** |
|
||||
| ------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------- | ---------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------ | --------------------------------------------------- | -------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Climate Change** | Expected Agriculture Loss Rate | Economic loss to agricultural value resulting from natural hazards each year | Field names are set in [datasets.yml](data/data-pipeline/data_pipeline/etl/score/config/datasets.yml). | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Yes | Update init function with new NRI_Table_CensusTracts.zip references in S3 & originating hazards.fema.gov download URL. | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Expected Building Loss Rate | Economic loss to building value resulting from natural hazards each year | Field names are set in [datasets.yml](data/data-pipeline/data_pipeline/etl/score/config/datasets.yml). | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Yes | Update init function with new NRI_Table_CensusTracts.zip references in S3 & originating hazards.fema.gov download URL. | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Expected Population Loss Rate | fatalities and injuries resulting from natural hazards each year | This burden only applies for census tracts with populations greater than 20 people. Field names are set in [datasets.yml](data/data-pipeline/data_pipeline/etl/score/config/datasets.yml). | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Yes | Update init function with new NRI_Table_CensusTracts.zip references in S3 & originating hazards.fema.gov download URL. | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Projected Flood Risk | projected risk to properties from projected floods, from tides, raid, riverine and storm surges within 30 years | these were emailed to J40 initially. | First Street Foundation | projecting 2022-2052. Released in 2020 | | 50 states, DC, PR | 2010 Census Tract (we think, but documentation does not say) | Updated data is available | Request updated fsf_flood.zip from FSF, or potentially programmatically prepare an equivalent file using their API. Stage resulting assets in S3 and update init reference. | Ed Kearns, Chief Data Officer of First Street Foundation. (ed@firststreet.org) | https://aws.amazon.com/marketplace/pp/prodview-r36lzzzjacd32?sr=0-1&ref_=beagle&applicationId=AWSMPContessa#overview | Can request again through email, or try to use proprietary [API](https://docs.google.com/spreadsheets/d/1_MWAVl6IHvWupuPtzdvo8Mu2Z7o3IM-FsWreweS9Mpc/edit?gid=0#gid=0). |
|
||||
| **Climate Change** | Projected Wildfire Risk | projected risk to properties form wildfire from fire fuels, weather, humans, and fire movement in 30 years | these were emailed to J40 initially | First Street Foundation | projecting 2022-2052. Released in 2020 | | 50 states, DC, PR | 2010 Census Tract (we think, but documentation does not say) | Updated data is available | Request updated fsf_flood.zip from FSF, or potentially programmatically prepare an equivalent file using their API. Stage resulting assets in S3 and update init reference. | Ed Kearns, Chief Data Officer of First Street Foundation. (ed@firststreet.org) | https://aws.amazon.com/marketplace/pp/prodview-r36lzzzjacd32?sr=0-1&ref_=beagle&applicationId=AWSMPContessa#overview | Can request again through email, or try to use proprietary [API](https://docs.google.com/spreadsheets/d/1_MWAVl6IHvWupuPtzdvo8Mu2Z7o3IM-FsWreweS9Mpc/edit?gid=0#gid=0). |
|
||||
| **Energy** | Energy Cost | Average annual energy costs divided by household income | | DOE | 2018 | LEAD Tool | 50 states, DC, PR | Census 2010 | Yes, in March 2023 | To-Do | Aaron Vimont, developer at National Renewable Energy Laboratory. (aaron.vimont@nrel.gov); Toy Reames (tony.reames@hq.doe.gov) | https://data.openei.org/submissions/573 https://www.energy.gov/scep/slsc/lead-tool | To-Do |
|
||||
| **Energy** | PM2.5 in the air | level of inhalable particles, 2.5 micrometers or smaller | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | Environmental Protection Agency (EPA) Office of Air and Radiation (OAR) | 2017 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Health** | Asthma | Share of people who have been told they have asthma | The updated PLACES dataset is available. | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Yes | Update init function with new PLACES__Local_Data_for_Better_Health__Census_Tract_Data_202X_release.csv references, including (1) the updated file staged in S3 and (2) the originating chronicdata.cdc.gov download URL. | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh |
|
||||
| **Health** | Diabetes | Share of people ages 18+ who have diabetes other than diabetes during pregnancy | The updated PLACES dataset is available. | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Yes | Update init function with new PLACES__Local_Data_for_Better_Health__Census_Tract_Data_202X_release.csv references, including (1) the updated file staged in S3 and (2) the originating chronicdata.cdc.gov download URL. | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh |
|
||||
| **Health** | Heart Disease | Share of people ages 18+ who have been told they have heart disease | The updated PLACES dataset is available. | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Yes | Update init function with new PLACES__Local_Data_for_Better_Health__Census_Tract_Data_202X_release.csv references, including (1) the updated file staged in S3 and (2) the originating chronicdata.cdc.gov download URL. | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh |
|
||||
| **Health** | Low life expectancy | Average number of years a person can expect to live | | CDC | 2010-2015 | US Small Area Life Expectancy Estimates Project | 50 States + DC | Census 2010 | Update coming 2025 | Update init function once data becomes available. | Elizabeth Arias (efa3@cdc.gov) | https://www.cdc.gov/nchs/nvss/usaleep/usaleep.html#life-expectancy | To-Do |
|
||||
| **Housing** | Historic Underinvestment | Census tracts that experienced historic underinvestment based on redlining maps created by the federal government’s Home Owners’ Loan Corporation (HOLC) between 1935 and 1940. | | National Community Reinvestment Coalition (NCRC) | | Home Owners Loan Corporation | 50 States + DC | Census 2010 & 2020 | Yes | Update init function with new HRS_2020.xlsx reference in S3 | | https://www.openicpsr.org/openicpsr/project/141121/version/V2/view | https://www.openicpsr.org/openicpsr/project/141121/version/V2/view |
|
||||
| **Housing** | Housing Cost | Share of households making less than 80% of the AMI and spending more than 30% of income on housing | Maybe could be found in ACS? Also: There is a note about [suppressed fields](https://www.huduser.gov/portal/datasets/cp.html) in the updated datasets relative to pre-2018 data. The impacted categories come from tables that do not appear to be used in the ETL, therefore no additional changes in the pipeline should be necessary. | Department of Housing and Urban Development (HUD) | 2014-2018 | Comprehensive housing affordability strategy dataset | 50 States + DC+ PR | Census 2010 | Yes | Update init function with new 140.csv zipped file references, including (1) the updated file staged in S3 and (2) the originating huduser.gov download URL. | Blair Russell, Office of Policy Development and Research; HUD (Blair.D.Russell@hud.gov) | https://www.huduser.gov/portal/datasets/cp.html#2006-2019_data | https://www.huduser.gov/portal/datasets/cp.html#data_2006-2021 |
|
||||
| **Housing** | Lack of Green Space | Amount of land, not including crop land, that is covered with artificial materials like concrete or pavement | | Multi-Resolution Land Characteristics Consortium | 2019 | National Land Cover Database (USGS) | 48 States + DC | Possibly not bound to geographies because its raster data. TPL imputed to census 2010 for us, I think. | Maybe? Use same data but pre process to Census 2020 | To-Do | Was provided by the trust for public land but you can also get it here as image data https://www.sciencebase.gov/catalog/item/5f21cef582cef313ed940043 | |
|
||||
| **Housing** | Lack of Indoor Plumbing | Share of homes without indoor kitchens or plumbing | Maybe could be found in ACS? Also: There is a note about [suppressed fields](https://www.huduser.gov/portal/datasets/cp.html) in the updated datasets relative to pre-2018 data. The impacted categories come from tables that do not appear to be used in the ETL, therefore no additional changes in the pipeline should be necessary. | Department of Housing and Urban Development (HUD) | 2014-2018 | Comprehensive housing affordability strategy dataset | 50 States + DC + PR | Census 2010 | Yes | Update init function with new 140.csv zipped file references, including (1) the updated file staged in S3 and (2) the originating huduser.gov download URL. | Blair Russell, Office of Policy Development and Research; HUD (Blair.D.Russell@hud.gov) | https://www.huduser.gov/portal/datasets/cp.html#2006-2019_data | https://www.huduser.gov/portal/datasets/cp.html#data_2006-2021 |
|
||||
| **Housing** | Lead paint | Share of homes that are likely to have lead paint | Share of homes built before 1960, which indicates potential lead paint exposure. Tracts with extremely high home values (i.e. median home values above the 90th percentile) that are less likely to face health risks from lead paint exposure are not included. | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021. (Need to check whether newest releases will work for us.) | To-Do | | | |
|
||||
| **Legacy Pollution** | Abandoned Mine Land | Presence of one or more abandoned min land within the tract | It looks like data is queried in the online GUI then exported. | Department of the Interior, Office of Surface Mining Reclamation and Enforcement | 2017 | Abandoned Mine Land Inventory System | 50 States + DC | Point Data | Yes, points can be mapped to any geography | Update init function with new "eAMLIS export of all data.tsv.zip" file reference in S3. | | https://www.osmre.gov/programs/e-amlis | https://amlis.osmre.gov/ |
|
||||
| **Legacy Pollution** | Formerly used Defense Site | Presence of one or more formerly used defense site within the tract | | US Army Corps of Engineers | 2019 | Formerly Used Defense Sites | 50 States + DC | Point Data | Yes, points can be mapped to any geography | To-Do | | https://www.usace.army.mil/Missions/Environmental/Formerly-Used-Defense-Sites/ | To-Do |
|
||||
| **Legacy Pollution** | Proximity to Hazardous Waste Facilities | count of hazardous waste facilities within 5 km | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | EPA | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Legacy Pollution** | Proximity to Risk Management Plan Facilities | count of risk management plan facilities within 5 kilometers | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | EPA | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Legacy Pollution** | Proximity to Superfund Sites | count of proposed or listed superfund or national priorities list sites within 5 km | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | EPA | 2021 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Transportation** | Diesel particulate matter exposure | amount of diesel exhaust in the air | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | EPA | 2014 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Transportation** | transportation barriers | average of relative cost and time spent on transportation | This is an example of when field naming does NOT come from [field_names.py](data/data-pipeline/data_pipeline/score/field_names.py). | DOT | 2022 | Transportation Access Disadvantage | 50 States + DC | Census 2020 | Yes | Update init function with new Shapefile_and_Metadata.zip file references. | | https://www.transportation.gov/equity-Justice40#:~:text=Transportation%20access%20disadvantage%20identifies%20communities%20and%20places%20that%20spend%20more%2C%20and%20take%20longer%2C%20to%20get%20where%20they%20need%20to%20go.%20(4) | https://www.transportation.gov/foia/foia-electronic-reading-room-category-four |
|
||||
| **Transportation** | traffic proximity and volume | count of vehicles at major roads within 500 meters | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | DOT (via EPA) | 2017 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Water and Wastewater** | underground storage tanks and releases | formula of the density of leaking underground storage tanks and number of all active underground storage tanks within 1500 feet of the census tract boundaries | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | EPA /UST Finder | 2021 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Water and Wastewater** | wastewater discharge | modeled toxic concentrations at parts of streams within 500 meters | The "LOWINCPCT" column gets renamed to "Poverty (Less than 200% of federal poverty line)" in L93. We should look at if/how that gets used since the logic we worked on lives in the census_acs ETL. | EPA Risk Screening Environmental Indicators | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | Update init function with new EJSCREEN_202X_USPR_Tracts.csv.zip file reference. There isn't a S3 reference specified in the function, so that's something we could potentially stage to pattern match most of the other etl files. | | https://gaftp.epa.gov/ejscreen/ | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Workforce Development** | Linguistic isolation | Share of households where no one over age 14 speaks English very well | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021. (Need to check whether newest releases will work for us.) | To-Do | | | |
|
||||
| **Workforce Development** | low median income | comparison of median income in the tract to the median incomes in the area | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021. (Need to check whether newest releases will work for us.) | To-Do | | | |
|
||||
| **Workforce Development** | poverty | share of people in households where income is at or below 100% of the federal poverty level | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021. (Need to check whether newest releases will work for us.) | To-Do | | | |
|
||||
| **Workforce Development** | Unemployment | number of unemployed people as a part of the labor force | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021. (Need to check whether newest releases will work for us.) | To-Do | | | |
|
||||
| **Workforce Development** | High school Education | Percent of people ages 25 or older whose high school education is less than a high school diploma | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021. (Need to check whether newest releases will work for us.) | To-Do | | | |
|
||||
| **Multiple Factors** | Low income | People in households where income is less than or equal to twice the federal poverty level, not including students enrolled in higher ed | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021. (Need to check whether newest releases will work for us.) | To-Do | | | |
|
||||
|
|
41
README.md
41
README.md
|
@ -26,6 +26,47 @@ The intermediate steps of the data pipeline, the scores, and the final output th
|
|||
|
||||
If you want to run the entire application locally, visit [QUICKSTART.md](QUICKSTART.md).
|
||||
|
||||
### Updating Data Sources
|
||||
|
||||
CEJST version 2.0 uses 2010 Census tracts as the primary unit of analysis and external key to link most datasets. Data published after 2020 will generally use 2020 Census tracts, so updating CEJST datasets to newer vintages will generally involve incorporating 2020 Census tracts.
|
||||
|
||||
Option 1: Keep 2010 boundaries on map
|
||||
- Makes sense if we are not updating updating American Community Survey (source of tract info & demographics, and low income for states & PR)
|
||||
- Lower lift option to update a few individual datasets
|
||||
|
||||
Option 2: Update to 2020 boundaries on map
|
||||
- Makes sense if we are updating American Community Survey (source of tract info & demographics, and low income for states & PR)
|
||||
- Higher lift but will eventually need to happen
|
||||
|
||||
In either case, we need to enable translation across Census tract vintages. The Census provides a simple relationship file.
|
||||
|
||||
Crosswalk:
|
||||
https://www2.census.gov/geo/docs/maps-data/data/rel2020/tract/tab20_tract20_tract10_natl.txt
|
||||
|
||||
Explanation of crosswalk:
|
||||
https://www2.census.gov/geo/pdfs/maps-data/data/rel2020/tract/explanation_tab20_tract20_tract10.pdf
|
||||
|
||||
NB: Crosswalks for territories are stored in separate files.
|
||||
|
||||
The average_tract_translate() function in [utils.py](data/data-pipeline/data_pipeline/utils.py) can be used to translate between 2010 and 2020 tract boundaries. For example, if we update to ACS data with 2020 boundaries, we will need to translate all data sources that are still using 2010 boundaries. To do this, average_tract_translate() will take each 2020 tract ID and find all the 2010 tracts that are mapped to it, and then take the mean of each column across these mapped 2010 tracts. Note that this function only works on numeric columns. The current set-up requires the crosswalk to be passed in as an argument; it may be easier to upload a static copy of the crosswalk and read it in at the beginning of the function.
|
||||
|
||||
Overview of how to update a source:
|
||||
1) If this is the first source being updated to 2020 geography, add new bucket in AWS for 2020 data. Stage the new data sources in s3.
|
||||
2) If this is the first source being updated to 2020 geography, the GEOID variable will need to be split into two variables, one for 2010 and one for 2020. Naming and conventions will depend on whether we still want to use 2010 geographies in the map.
|
||||
3) Look at [DATASETS.md](DATASETS.md) to see specific update instrutions for each data source, including URLs for updated data sources.
|
||||
4) Check to see that the columns we're using still exist in the new data source. If not, make a plan for methodology changes.
|
||||
5) Update paths in ETL files.
|
||||
6) Update path in ETL file else statement where possible.
|
||||
7) Update GEOID variable definitions in ETL files.
|
||||
8) If the updated data source is using different tract boundaries from what we want to use on the map, call the function in utils.py at the end of the ETL file.
|
||||
9) Update [DATASETS.md](DATASETS.md) to reflect the new changes.
|
||||
|
||||
In same cases, updated data isn't available yet:
|
||||
- CDC life expectancy at birth by state
|
||||
- First Street Foundation (acquired through email; can request again through email, or try to use proprietary API)
|
||||
|
||||
Legacy pollution date from the US Army Corps of Engineers uses geolocation to map their data to Census tracts. Points can be mapped to any geography, but we will need to update our mappings if we want to use 2020 tracts boundaries in the map.
|
||||
|
||||
### Advanced Guides
|
||||
|
||||
If you have software experience or more specific use cases, in-depth documentation of how to work with this project can be found in [INSTALLATION.md](INSTALLATION.md).
|
||||
|
|
File diff suppressed because it is too large
Load diff
|
@ -5,6 +5,7 @@ import shutil
|
|||
import sys
|
||||
import uuid
|
||||
import zipfile
|
||||
import pandas as pd
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
from typing import Union
|
||||
|
@ -348,6 +349,57 @@ def zip_directory(
|
|||
zipf.close()
|
||||
|
||||
|
||||
def average_tract_translate(
|
||||
df: pd.DataFrame,
|
||||
xwalk: pd.DataFrame,
|
||||
tract_year_in: str = "GEOID10_TRACT",
|
||||
tract_year_out: str = "GEOID20_TRACT",
|
||||
) -> pd.DataFrame:
|
||||
"""
|
||||
Minimally tested prototype of an averaging function
|
||||
|
||||
Can be used to translate between 2010 and 2020 tract boundaries.
|
||||
For example, if we update to ACS data with 2020 boundaries, we will need to
|
||||
translate all data sources that are still using 2010 boundaries. To do this,
|
||||
average_tract_translate() will take each 2020 tract ID and find all the 2010
|
||||
tracts that are mapped to it, and then take the mean of each column across
|
||||
these mapped 2010 tracts.
|
||||
|
||||
Note that this function only works on numeric columns.
|
||||
|
||||
The current set-up requires the crosswalk to be passed in as an argument;
|
||||
it may be easier to upload a static copy of the crosswalk
|
||||
and read it in at the beginning of the function.
|
||||
|
||||
Crosswalk:
|
||||
https://www2.census.gov/geo/docs/maps-data/data/rel2020/tract/tab20_tract20_tract10_natl.txt
|
||||
|
||||
Explanation of crosswalk:
|
||||
https://www2.census.gov/geo/pdfs/maps-data/data/rel2020/tract/explanation_tab20_tract20_tract10.pdf
|
||||
|
||||
NB: Crosswalks for territories are stored in separate files.
|
||||
"""
|
||||
|
||||
# pre-process xwalk
|
||||
# could be uploaded as a static copy and read in here
|
||||
xwalk = xwalk.rename(
|
||||
columns={
|
||||
"GEOID_TRACT_10": "GEOID10_TRACT",
|
||||
"GEOID_TRACT_20": "GEOID20_TRACT",
|
||||
}
|
||||
)
|
||||
xwalk = xwalk[["GEOID10_TRACT", "GEOID20_TRACT"]]
|
||||
|
||||
# merge xwalk into input data
|
||||
merged_df = df.merge(xwalk, how="left", on=tract_year_in)
|
||||
|
||||
# group by average
|
||||
averaged_df = merged_df.groupby(tract_year_out).mean()
|
||||
|
||||
# reindex (bc input df doesn't have tract ID as index but rather as column)
|
||||
return averaged_df.reset_index()
|
||||
|
||||
|
||||
def load_yaml_dict_from_file(
|
||||
yaml_file_path: Path,
|
||||
schema_class: Union[CSVConfig, ExcelConfig, CodebookConfig],
|
||||
|
|
Loading…
Add table
Reference in a new issue