Issue 844: Add island areas to Definition L (#957)

This ended up being a pretty large task. Here's what this PR does:

1. Pulls in Vincent's data from island areas into the score ETL. This is from the 2010 decennial census, the last census of any kind in the island areas.
2. Grabs a few new fields from 2010 island areas decennial census.
3. Calculates area median income for island areas.
4. Stops using EJSCREEN as the source of our high school education data and directly pulls that from census (this was related to this project so I went ahead and fixed it).
5. Grabs a bunch of data from the 2010 ACS in the states/Puerto Rico/DC, so that we can create percentiles comparing apples-to-apples (ish) from 2010 island areas decennial census data to 2010 ACS data. This required creating a new class because all the ACS fields are different between 2010 and 2019, so it wasn't as simple as looping over a year parameter.
6. Creates a combined population field of island areas and mainland so we can use those stats in our comparison tool, and updates the comparison tool accordingly.
This commit is contained in:
Lucas Merrill Brown 2021-12-03 15:46:10 -05:00 committed by GitHub
commit 1d101c93d2
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
15 changed files with 882 additions and 153 deletions

View file

@ -32,10 +32,15 @@ class ExtractTransformLoad:
FILES_PATH: Path = settings.APP_ROOT / "files"
GEOID_FIELD_NAME: str = "GEOID10"
GEOID_TRACT_FIELD_NAME: str = "GEOID10_TRACT"
# TODO: investigate. Census says there are only 217,740 CBGs in the US. This might be from CBGs at different time periods.
# TODO: investigate. Census says there are only 217,740 CBGs in the US. This might
# be from CBGs at different time periods.
EXPECTED_MAX_CENSUS_BLOCK_GROUPS: int = 250000
# TODO: investigate. Census says there are only 73,057 tracts in the US. This might be from tracts at different time periods.
EXPECTED_MAX_CENSUS_TRACTS: int = 74027
# TODO: investigate. Census says there are only 74,134 tracts in the US,
# Puerto Rico, and island areas. This might be from tracts at different time
# periods. https://github.com/usds/justice40-tool/issues/964
EXPECTED_MAX_CENSUS_TRACTS: int = 74160
def __init__(self, config_path: Path) -> None:
"""Inits the class with instance specific variables"""