Add Michigan EJ Screen into data-pipeline's ETL and provide automated scoring and statistics outputs (#1091)

* draft wip

* initial commit

* clear output from notebook

* revert to 65ceb7900f

* draft wip

* initial commit

* clear output from notebook

* revert to 65ceb7900f

* make michigan prefix for readable

* standardize Michigan names and move all constants from class into field names module

* standardize Michigan names and move all constants from class into field names module

* include only pertinent columns for scoring comparison tool

* michigan EJSCREEN standardization

* final PR feedback

* added exposition and summary of Michigan EJSCREEN

* added exposition and summary of Michigan EJSCREEN

* fix typo

Co-authored-by: Saran Ahluwalia <ahlusar.ahluwalia@gmail.com>
This commit is contained in:
Saran Ahluwalia 2021-12-31 15:38:52 -05:00 committed by GitHub
parent 24f8eb93c4
commit a4137fdc98
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
6 changed files with 142 additions and 3 deletions

View file

@ -99,6 +99,11 @@ DATASET_LIST = [
"module_dir": "tree_equity_score",
"class_name": "TreeEquityScoreETL",
},
{
"name": "michigan_ejscreen",
"module_dir": "michigan_ejscreen",
"class_name": "MichiganEnviroScreenETL",
},
]
CENSUS_INFO = {
"name": "census",

View file

@ -0,0 +1,32 @@
# Michigan EJSCREEN
The Michigan EJSCREEN description and publication can be found [here](https://deepblue.lib.umich.edu/bitstream/handle/2027.42/149105/AssessingtheStateofEnvironmentalJusticeinMichigan_344.pdf).
#### Some notes about the input source data column fields:
There are two pertinent columns used - `EJ_Score_Cal_Min` and `Pct_CalMin` that are referenced in the source codebase. To our knowledge, these columns reflect the adoption and the comparative quantitative analysis from two different approaches. The "Cal" prefix reflects CalEPA's CalEnviroScreen that omits racial and ethnic data. The "Min" abbreviation reflects Minnesota Pollution Control Agencys (MPCA) approach to including this data. Please see pages 37 - 39 in the above reference for further details. Briefly, the authors adopted a combination of both the CalEnviroScreen's methodology and the MCPA's methodology. The scores and percentile rankings in the input data source sheet are the same as those reflected in the cited report, included in Appendix I and in the latest version of the mapping [tool](https://www.arcgis.com/apps/webappviewer/index.html?id=dc4f0647dda34959963488d3f519fd24).
#### Additional information on the adoption of the methodology from CalEnviroScreen and MCPA
Both CalEPA's CalEnviroScreen and the Minnesota Pollution Control Agencys (MPCA) methodology are adopted and used for both comparative purposes and for the identification of areas of concern. The latter, in particular, is used to identify tribal areas. According to the authors, to make permitting decisions, MPCA assesses whether the community, measured at the census tract level, fits at least one of the following criteria:
* Percent of the non-white population is at least 50%
* "More than 40% of the households have a household income of less than 185% of the federal
poverty level (FPL)”
* If the facility is within the boundaries of a “tribal community” (MPCA 2015).
Furthermore, the authors state that the MCPA methodology included data on tribal community boundaries, as defined by the US Census Bureau, and data on poverty, race, and ethnicity. However, the authors also note that the MCPA's methodology does not rank any census tracts.
In addition, although the CalEPA does not analyze data on race and ethnicity in CalEnviroScreen, the researchers incorporated race and ethnicity data in their assessment of environmental justice in Michigan. To justify the incorporation of race and ethnic data, the team compared the tract rankings with and without the data.
A Spearman's rank-order correlation was calculated for the 2,741 census tracts within Michigan with the two variables being environmental justice scores using the CalEPA methodology 1) without racial and ethnic data and 2) with racial and ethnic data. These scores were then ranked and the Spearman rank-order correlation was calculated. These statistics are not included in the output of this ETL process. Please see Chapter 5 and Chapter 6 for further details.
Finally, please see pages 104 -106 for details on the justification and details for the applicability of the upper quartile as a means to identify communities in Michigan with the potential for environmental justice concerns. It should also be noted that, according to the authors, that CalEPA also designates the top 25% scoring tracts as “disadvantaged communities".
Sources:
* Minnesota Pollution Control Agency. (2015, December 15). Environmental Justice Framework Report.
Retrieved from https://www.pca.state.mn.us/sites/default/files/p-gen5-05.pdf.
* Faust, J., L. August, K. Bangia, V. Galaviz, J. Leichty, S. Prasad… and L. Zeise. (2017, January). Update to the California Communities Environmental Health Screening Tool CalEnviroScreen 3.0. Retrieved from OEHHA website: https://oehha.ca.gov/media/downloads/calenviroscreen/report/ces3report.pdf

View file

@ -0,0 +1,69 @@
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger
from data_pipeline.score import field_names
from data_pipeline.config import settings
logger = get_module_logger(__name__)
class MichiganEnviroScreenETL(ExtractTransformLoad):
"""Michigan EJ Screen class that ingests dataset represented
here: https://www.arcgis.com/apps/webappviewer/index.html?id=dc4f0647dda34959963488d3f519fd24
This class ingests the data presented in "Assessing the State of Environmental
Justice in Michigan." Please see the README in this module for further details.
"""
def __init__(self):
self.MICHIGAN_EJSCREEN_S3_URL = (
settings.AWS_JUSTICE40_DATASOURCES_URL
+ "/michigan_ejscore_12212021.csv"
)
self.CSV_PATH = self.DATA_PATH / "dataset" / "michigan_ejscreen"
self.MICHIGAN_EJSCREEN_PRIORITY_COMMUNITY_THRESHOLD: float = 0.75
self.COLUMNS_TO_KEEP = [
self.GEOID_TRACT_FIELD_NAME,
field_names.MICHIGAN_EJSCREEN_SCORE_FIELD,
field_names.MICHIGAN_EJSCREEN_PERCENTILE_FIELD,
field_names.MICHIGAN_EJSCREEN_PRIORITY_COMMUNITY_FIELD,
]
self.df: pd.DataFrame
def extract(self) -> None:
logger.info("Downloading Michigan EJSCREEN Data")
self.df = pd.read_csv(
filepath_or_buffer=self.MICHIGAN_EJSCREEN_S3_URL,
dtype={"GEO_ID": "string"},
low_memory=False,
)
def transform(self) -> None:
logger.info("Transforming Michigan EJSCREEN Data")
self.df.rename(
columns={
"GEO_ID": self.GEOID_TRACT_FIELD_NAME,
"EJ_Score_Cal_Min": field_names.MICHIGAN_EJSCREEN_SCORE_FIELD,
"Pct_CalMin": field_names.MICHIGAN_EJSCREEN_PERCENTILE_FIELD,
},
inplace=True,
)
# Calculate the top quartile of prioritized communities
# Please see pg. 104 - 109 from source:
# pg. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/149105/AssessingtheStateofEnvironmentalJusticeinMichigan_344.pdf
self.df[field_names.MICHIGAN_EJSCREEN_PRIORITY_COMMUNITY_FIELD] = (
self.df[field_names.MICHIGAN_EJSCREEN_PERCENTILE_FIELD]
>= self.MICHIGAN_EJSCREEN_PRIORITY_COMMUNITY_THRESHOLD
)
def load(self) -> None:
logger.info("Saving Michigan Environmental Screening Tool to CSV")
# write nationwide csv
self.CSV_PATH.mkdir(parents=True, exist_ok=True)
self.df[self.COLUMNS_TO_KEEP].to_csv(
self.CSV_PATH / "michigan_ejscreen.csv", index=False
)

View file

@ -295,6 +295,25 @@
"energy_definition_alternative_draft_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fe4a2939",
"metadata": {},
"outputs": [],
"source": [
"# Load Michigan EJSCREEN\n",
"michigan_ejscreen_data_path = (\n",
" DATA_DIR / \"dataset\" / \"michigan_ejscreen\" / \"michigan_ejscreen.csv\"\n",
")\n",
"michigan_ejscreen_df = pd.read_csv(\n",
" michigan_ejscreen_data_path,\n",
" dtype={ExtractTransformLoad.GEOID_TRACT_FIELD_NAME: \"string\"},\n",
")\n",
"\n",
"michigan_ejscreen_df.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -311,6 +330,7 @@
" persistent_poverty_df,\n",
" mapping_inequality_df,\n",
" energy_definition_alternative_draft_df,\n",
" michigan_ejscreen_df\n",
"]\n",
"\n",
"merged_df = functools.reduce(\n",
@ -456,6 +476,14 @@
" priority_communities_field=field_names.ENERGY_RELATED_COMMUNITIES_DEFINITION_ALTERNATIVE,\n",
" other_census_tract_fields_to_keep=[],\n",
" ),\n",
" Index(\n",
" method_name=\"Michigan EJSCREEN\",\n",
" priority_communities_field=field_names.MICHIGAN_EJSCREEN_PRIORITY_COMMUNITY_FIELD,\n",
" other_census_tract_fields_to_keep=[\n",
" field_names.MICHIGAN_EJSCREEN_SCORE_FIELD,\n",
" field_names.MICHIGAN_EJSCREEN_PERCENTILE_FIELD,\n",
" ],\n",
" ), \n",
" ]\n",
" # Insert indices for each of the HOLC factors.\n",
" # Note: since these involve no renaming, we write them using list comprehension.\n",
@ -1298,7 +1326,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
@ -1312,7 +1340,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
"version": "3.6.2"
}
},
"nbformat": 4,

View file

@ -206,13 +206,18 @@ EJSCREEN_AREAS_OF_CONCERN_STATE_90TH_PERCENTILE_COMMUNITIES_FIELD = (
EJSCREEN_AREAS_OF_CONCERN_STATE_95TH_PERCENTILE_COMMUNITIES_FIELD = (
"EJSCREEN Areas of Concern, State, 95th percentile (communities)"
)
# Mapping inequality data.
HOLC_GRADE_D_TRACT_PERCENT_FIELD: str = "Percent of tract that is HOLC Grade D"
HOLC_GRADE_D_TRACT_20_PERCENT_FIELD: str = "Tract is >20% HOLC Grade D"
HOLC_GRADE_D_TRACT_50_PERCENT_FIELD: str = "Tract is >50% HOLC Grade D"
HOLC_GRADE_D_TRACT_75_PERCENT_FIELD: str = "Tract is >75% HOLC Grade D"
# Michigan Environmental Screening Tool ETL Constants
MICHIGAN_EJSCREEN_SCORE_FIELD: str = "Michigan EJSCREEN Score Field"
MICHIGAN_EJSCREEN_PERCENTILE_FIELD: str = "Michigan EJSCREEN Percentile Field"
MICHIGAN_EJSCREEN_PRIORITY_COMMUNITY_FIELD: str = (
"Michigan EJSCREEN Priority Community"
)
# Child Opportunity Index data
# Summer days with maximum temperature above 90F.