Issue 105: Configure and run black and other pre-commit hooks (clean branch) (#1962)

* Configure and run `black` and other pre-commit hooks

Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>
This commit is contained in:
Lucas Merrill Brown 2022-10-04 18:08:47 -04:00 committed by GitHub
commit 6e6223cd5e
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
162 changed files with 716 additions and 602 deletions

1
.github/CODEOWNERS vendored
View file

@ -1,2 +1 @@
* @esfoobar-usds @vim-usds @emma-nechamkin @mattbowen-usds * @esfoobar-usds @vim-usds @emma-nechamkin @mattbowen-usds

View file

@ -13,4 +13,3 @@ Los mantenedores del proyecto tienen el derecho y la obligación de eliminar, ed
Los casos de abuso, acoso o de otro comportamiento inaceptable se pueden denunciar abriendo un problema o contactando con uno o más de los mantenedores del proyecto en justice40open@usds.gov. Los casos de abuso, acoso o de otro comportamiento inaceptable se pueden denunciar abriendo un problema o contactando con uno o más de los mantenedores del proyecto en justice40open@usds.gov.
Este Código de conducta es una adaptación de la versión 1.0.0 del Convenio del colaborador ([Contributor Covenant](http://contributor-covenant.org), *en inglés*) disponible en el sitio http://contributor-covenant.org/version/1/0/0/ *(en inglés)*. Este Código de conducta es una adaptación de la versión 1.0.0 del Convenio del colaborador ([Contributor Covenant](http://contributor-covenant.org), *en inglés*) disponible en el sitio http://contributor-covenant.org/version/1/0/0/ *(en inglés)*.

View file

@ -43,4 +43,3 @@ Si desea colaborar con alguna parte del código base, bifurque el repositorio si
* Al menos un revisor autorizado debe aprobar la confirmación (en [CODEOWNERS](https://github.com/usds/justice40-tool/tree/main/.github/CODEOWNERS), en inglés, consulte la lista más reciente de estos revisores). * Al menos un revisor autorizado debe aprobar la confirmación (en [CODEOWNERS](https://github.com/usds/justice40-tool/tree/main/.github/CODEOWNERS), en inglés, consulte la lista más reciente de estos revisores).
* Todas las verificaciones de estado obligatorias deben ser aprobadas. * Todas las verificaciones de estado obligatorias deben ser aprobadas.
Si hay un desacuerdo importante entre los integrantes del equipo, se organizará una reunión con el fin de determinar el plan de acción para la solicitud de incorporación de cambios. Si hay un desacuerdo importante entre los integrantes del equipo, se organizará una reunión con el fin de determinar el plan de acción para la solicitud de incorporación de cambios.

View file

@ -28,4 +28,3 @@ Por estos u otros propósitos y motivos, y sin ninguna expectativa de otra consi
c. El Afirmante excluye la responsabilidad de los derechos de compensación de otras personas que se puedan aplicar a la Obra o a cualquier uso de esta, incluidos, entre otros, los Derechos de Autor y Derechos Conexos de cualquier persona sobre la Obra. Además, el Afirmante excluye la responsabilidad de obtener los consentimientos o permisos u otros derechos necesarios que se exijan para cualquier uso de la Obra. c. El Afirmante excluye la responsabilidad de los derechos de compensación de otras personas que se puedan aplicar a la Obra o a cualquier uso de esta, incluidos, entre otros, los Derechos de Autor y Derechos Conexos de cualquier persona sobre la Obra. Además, el Afirmante excluye la responsabilidad de obtener los consentimientos o permisos u otros derechos necesarios que se exijan para cualquier uso de la Obra.
d. El Afirmante entiende y reconoce que Creative Commons no es una parte en este documento y que no tiene ningún derecho u obligación con respecto a esta CC0 o al uso de la Obra. d. El Afirmante entiende y reconoce que Creative Commons no es una parte en este documento y que no tiene ningún derecho u obligación con respecto a esta CC0 o al uso de la Obra.

View file

@ -0,0 +1,39 @@
exclude: ^client|\.csv
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- repo: https://github.com/lucasmbrown/mirrors-autoflake
rev: v1.3
hooks:
- id: autoflake
args:
[
"--in-place",
"--remove-all-unused-imports",
"--remove-unused-variable",
"--ignore-init-module-imports",
]
- repo: https://github.com/pycqa/isort
rev: 5.10.1
hooks:
- id: isort
name: isort (python)
args:
[
"--force-single-line-imports",
"--profile=black",
"--line-length=80",
"--src-path=.:data/data-pipeline"
]
- repo: https://github.com/ambv/black
rev: 22.8.0
hooks:
- id: black
language_version: python3.9
args: [--config=./data/data-pipeline/pyproject.toml]

View file

@ -250,7 +250,59 @@ as `poetry run tox`.
Each run can take a while to build the whole environment. If you'd like to save time, Each run can take a while to build the whole environment. If you'd like to save time,
you can use the previously built environment by running `poetry run tox -e lint` you can use the previously built environment by running `poetry run tox -e lint`
which will drastically speed up the process. which will drastically speed up the linting process.
### Configuring pre-commit hooks
<!-- markdown-link-check-disable -->
To promote consistent code style and quality, we use git pre-commit hooks to
automatically lint and reformat our code before every commit we make to the codebase.
Pre-commit hooks are defined in the file [`.pre-commit-config.yaml`](../.pre-commit-config.yaml).
<!-- markdown-link-check-enable -->
1. First, install [`pre-commit`](https://pre-commit.com/) globally:
$ brew install pre-commit
2. While in the `data/data-pipeline` directory, run `pre-commit install` to install
the specific git hooks used in this repository.
Now, any time you commit code to the repository, the hooks will run on all modified files automatically. If you wish,
you can force a re-run on all files with `pre-commit run --all-files`.
#### Conflicts between backend and frontend git hooks
<!-- markdown-link-check-disable -->
In the front-end part of the codebase (the `justice40-tool/client` folder), we use
`Husky` to run pre-commit hooks for the front-end. This is different than the
`pre-commit` framework we use for the backend. The frontend `Husky` hooks are
configured at
[client/.husky](client/.husky).
It is not possible to run both our `Husky` hooks and `pre-commit` hooks on every
commit; either one or the other will run.
<!-- markdown-link-check-enable -->
`Husky` is installed every time you run `npm install`. To use the `Husky` front-end
hooks during front-end development, simply run `npm install`.
However, running `npm install` overwrites the backend hooks setup by `pre-commit`.
To restore the backend hooks after running `npm install`, do the following:
1. Run `pre-commit install` while in the `data/data-pipeline` directory.
2. The terminal should respond with an error message such as:
```
[ERROR] Cowardly refusing to install hooks with `core.hooksPath` set.
hint: `git config --unset-all core.hooksPath`
```
This error is caused by having previously run `npm install` which used `Husky` to
overwrite the hooks path.
3. Follow the hint and run `git config --unset-all core.hooksPath`.
4. Run `pre-commit install` again.
Now `pre-commit` and the backend hooks should take precedence.
### The Application entrypoint ### The Application entrypoint

View file

@ -1,31 +1,27 @@
from subprocess import call
import sys import sys
import click from subprocess import call
import click
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.runner import ( from data_pipeline.etl.runner import etl_runner
etl_runner, from data_pipeline.etl.runner import score_generate
score_generate, from data_pipeline.etl.runner import score_geo
score_geo, from data_pipeline.etl.runner import score_post
score_post, from data_pipeline.etl.sources.census.etl_utils import check_census_data_source
)
from data_pipeline.etl.sources.census.etl_utils import ( from data_pipeline.etl.sources.census.etl_utils import (
check_census_data_source,
reset_data_directories as census_reset, reset_data_directories as census_reset,
zip_census_data,
) )
from data_pipeline.etl.sources.census.etl_utils import zip_census_data
from data_pipeline.etl.sources.tribal.etl_utils import ( from data_pipeline.etl.sources.tribal.etl_utils import (
reset_data_directories as tribal_reset, reset_data_directories as tribal_reset,
) )
from data_pipeline.tile.generate import generate_tiles from data_pipeline.tile.generate import generate_tiles
from data_pipeline.utils import ( from data_pipeline.utils import check_first_run
data_folder_cleanup, from data_pipeline.utils import data_folder_cleanup
get_module_logger, from data_pipeline.utils import downloadable_cleanup
score_folder_cleanup, from data_pipeline.utils import get_module_logger
downloadable_cleanup, from data_pipeline.utils import score_folder_cleanup
temp_folder_cleanup, from data_pipeline.utils import temp_folder_cleanup
check_first_run,
)
logger = get_module_logger(__name__) logger = get_module_logger(__name__)
@ -36,8 +32,6 @@ dataset_cli_help = "Grab the data from either 'local' for local access or 'aws'
def cli(): def cli():
"""Defines a click group for the commands below""" """Defines a click group for the commands below"""
pass
@cli.command(help="Clean up all census data folders") @cli.command(help="Clean up all census data folders")
def census_cleanup(): def census_cleanup():

View file

@ -12,12 +12,12 @@ To see more: https://buildmedia.readthedocs.org/media/pdf/papermill/latest/paper
To run: To run:
` $ python src/run_tract_comparison.py --template_notebook=TEMPLATE.ipynb --parameter_yaml=PARAMETERS.yaml` ` $ python src/run_tract_comparison.py --template_notebook=TEMPLATE.ipynb --parameter_yaml=PARAMETERS.yaml`
""" """
import os
import datetime
import argparse import argparse
import yaml import datetime
import os
import papermill as pm import papermill as pm
import yaml
def _read_param_file(param_file: str) -> dict: def _read_param_file(param_file: str) -> dict:

View file

@ -16,7 +16,7 @@
"import matplotlib.pyplot as plt\n", "import matplotlib.pyplot as plt\n",
"\n", "\n",
"from data_pipeline.score import field_names\n", "from data_pipeline.score import field_names\n",
"from data_pipeline.comparison_tool.src import utils \n", "from data_pipeline.comparison_tool.src import utils\n",
"\n", "\n",
"pd.options.display.float_format = \"{:,.3f}\".format\n", "pd.options.display.float_format = \"{:,.3f}\".format\n",
"%load_ext lab_black" "%load_ext lab_black"
@ -128,9 +128,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"utils.validate_new_data(\n", "utils.validate_new_data(file_path=COMPARATOR_FILE, score_col=COMPARATOR_COLUMN)"
" file_path=COMPARATOR_FILE, score_col=COMPARATOR_COLUMN\n",
")"
] ]
}, },
{ {
@ -148,20 +146,25 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"comparator_cols = [COMPARATOR_COLUMN] + OTHER_COMPARATOR_COLUMNS if OTHER_COMPARATOR_COLUMNS else [COMPARATOR_COLUMN]\n", "comparator_cols = (\n",
" [COMPARATOR_COLUMN] + OTHER_COMPARATOR_COLUMNS\n",
" if OTHER_COMPARATOR_COLUMNS\n",
" else [COMPARATOR_COLUMN]\n",
")\n",
"\n", "\n",
"#papermill_description=Loading_data\n", "# papermill_description=Loading_data\n",
"joined_df = pd.concat(\n", "joined_df = pd.concat(\n",
" [\n", " [\n",
" utils.read_file(\n", " utils.read_file(\n",
" file_path=SCORE_FILE,\n", " file_path=SCORE_FILE,\n",
" columns=[TOTAL_POPULATION_COLUMN, SCORE_COLUMN] + ADDITIONAL_DEMO_COLUMNS,\n", " columns=[TOTAL_POPULATION_COLUMN, SCORE_COLUMN]\n",
" + ADDITIONAL_DEMO_COLUMNS,\n",
" geoid=GEOID_COLUMN,\n", " geoid=GEOID_COLUMN,\n",
" ),\n", " ),\n",
" utils.read_file(\n", " utils.read_file(\n",
" file_path=COMPARATOR_FILE,\n", " file_path=COMPARATOR_FILE,\n",
" columns=comparator_cols,\n", " columns=comparator_cols,\n",
" geoid=GEOID_COLUMN\n", " geoid=GEOID_COLUMN,\n",
" ),\n", " ),\n",
" utils.read_file(\n", " utils.read_file(\n",
" file_path=DEMOGRAPHIC_FILE,\n", " file_path=DEMOGRAPHIC_FILE,\n",
@ -196,13 +199,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#papermill_description=Summary_stats\n", "# papermill_description=Summary_stats\n",
"population_df = utils.produce_summary_stats(\n", "population_df = utils.produce_summary_stats(\n",
" joined_df=joined_df,\n", " joined_df=joined_df,\n",
" comparator_column=COMPARATOR_COLUMN,\n", " comparator_column=COMPARATOR_COLUMN,\n",
" score_column=SCORE_COLUMN,\n", " score_column=SCORE_COLUMN,\n",
" population_column=TOTAL_POPULATION_COLUMN,\n", " population_column=TOTAL_POPULATION_COLUMN,\n",
" geoid_column=GEOID_COLUMN\n", " geoid_column=GEOID_COLUMN,\n",
")\n", ")\n",
"population_df" "population_df"
] ]
@ -224,18 +227,18 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#papermill_description=Tract_stats\n", "# papermill_description=Tract_stats\n",
"tract_level_by_identification_df = pd.concat(\n", "tract_level_by_identification_df = pd.concat(\n",
" [\n", " [\n",
" utils.get_demo_series(\n", " utils.get_demo_series(\n",
" grouping_column=COMPARATOR_COLUMN,\n", " grouping_column=COMPARATOR_COLUMN,\n",
" joined_df=joined_df,\n", " joined_df=joined_df,\n",
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS\n", " demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS,\n",
" ),\n", " ),\n",
" utils.get_demo_series(\n", " utils.get_demo_series(\n",
" grouping_column=SCORE_COLUMN,\n", " grouping_column=SCORE_COLUMN,\n",
" joined_df=joined_df,\n", " joined_df=joined_df,\n",
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS\n", " demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS,\n",
" ),\n", " ),\n",
" ],\n", " ],\n",
" axis=1,\n", " axis=1,\n",
@ -256,17 +259,25 @@
" y=\"Variable\",\n", " y=\"Variable\",\n",
" x=\"Avg in tracts\",\n", " x=\"Avg in tracts\",\n",
" hue=\"Definition\",\n", " hue=\"Definition\",\n",
" data=tract_level_by_identification_df.sort_values(by=COMPARATOR_COLUMN, ascending=False)\n", " data=tract_level_by_identification_df.sort_values(\n",
" by=COMPARATOR_COLUMN, ascending=False\n",
" )\n",
" .stack()\n", " .stack()\n",
" .reset_index()\n", " .reset_index()\n",
" .rename(\n", " .rename(\n",
" columns={\"level_0\": \"Variable\", \"level_1\": \"Definition\", 0: \"Avg in tracts\"}\n", " columns={\n",
" \"level_0\": \"Variable\",\n",
" \"level_1\": \"Definition\",\n",
" 0: \"Avg in tracts\",\n",
" }\n",
" ),\n", " ),\n",
" palette=\"Blues\",\n", " palette=\"Blues\",\n",
")\n", ")\n",
"plt.xlim(0, 1)\n", "plt.xlim(0, 1)\n",
"plt.title(\"Tract level averages by identification strategy\")\n", "plt.title(\"Tract level averages by identification strategy\")\n",
"plt.savefig(os.path.join(OUTPUT_DATA_PATH, \"tract_lvl_avg.jpg\"), bbox_inches='tight')" "plt.savefig(\n",
" os.path.join(OUTPUT_DATA_PATH, \"tract_lvl_avg.jpg\"), bbox_inches=\"tight\"\n",
")"
] ]
}, },
{ {
@ -276,13 +287,13 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#papermill_description=Tract_stats_grouped\n", "# papermill_description=Tract_stats_grouped\n",
"tract_level_by_grouping_df = utils.get_tract_level_grouping(\n", "tract_level_by_grouping_df = utils.get_tract_level_grouping(\n",
" joined_df=joined_df,\n", " joined_df=joined_df,\n",
" score_column=SCORE_COLUMN,\n", " score_column=SCORE_COLUMN,\n",
" comparator_column=COMPARATOR_COLUMN,\n", " comparator_column=COMPARATOR_COLUMN,\n",
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS,\n", " demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS,\n",
" keep_missing_values=KEEP_MISSING_VALUES_FOR_SEGMENTATION\n", " keep_missing_values=KEEP_MISSING_VALUES_FOR_SEGMENTATION,\n",
")\n", ")\n",
"\n", "\n",
"tract_level_by_grouping_formatted_df = utils.format_multi_index_for_excel(\n", "tract_level_by_grouping_formatted_df = utils.format_multi_index_for_excel(\n",
@ -315,7 +326,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#papermill_description=Population_stats\n", "# papermill_description=Population_stats\n",
"population_weighted_stats_df = pd.concat(\n", "population_weighted_stats_df = pd.concat(\n",
" [\n", " [\n",
" utils.construct_weighted_statistics(\n", " utils.construct_weighted_statistics(\n",
@ -363,7 +374,7 @@
"comparator_and_cejst_proportion_series, states = utils.get_final_summary_info(\n", "comparator_and_cejst_proportion_series, states = utils.get_final_summary_info(\n",
" population=population_df,\n", " population=population_df,\n",
" comparator_file=COMPARATOR_FILE,\n", " comparator_file=COMPARATOR_FILE,\n",
" geoid_col=GEOID_COLUMN\n", " geoid_col=GEOID_COLUMN,\n",
")" ")"
] ]
}, },
@ -393,7 +404,7 @@
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": [ "source": [
"#papermill_description=Writing_excel\n", "# papermill_description=Writing_excel\n",
"utils.write_single_comparison_excel(\n", "utils.write_single_comparison_excel(\n",
" output_excel=OUTPUT_EXCEL,\n", " output_excel=OUTPUT_EXCEL,\n",
" population_df=population_df,\n", " population_df=population_df,\n",
@ -401,7 +412,7 @@
" population_weighted_stats_df=population_weighted_stats_df,\n", " population_weighted_stats_df=population_weighted_stats_df,\n",
" tract_level_by_grouping_formatted_df=tract_level_by_grouping_formatted_df,\n", " tract_level_by_grouping_formatted_df=tract_level_by_grouping_formatted_df,\n",
" comparator_and_cejst_proportion_series=comparator_and_cejst_proportion_series,\n", " comparator_and_cejst_proportion_series=comparator_and_cejst_proportion_series,\n",
" states_text=states_text\n", " states_text=states_text,\n",
")" ")"
] ]
} }

View file

@ -1,9 +1,9 @@
import pathlib import pathlib
import pandas as pd import pandas as pd
import xlsxwriter import xlsxwriter
from data_pipeline.score import field_names
from data_pipeline.etl.sources.census.etl_utils import get_state_information from data_pipeline.etl.sources.census.etl_utils import get_state_information
from data_pipeline.score import field_names
# Some excel parameters # Some excel parameters
DEFAULT_COLUMN_WIDTH = 18 DEFAULT_COLUMN_WIDTH = 18

View file

@ -1,8 +1,7 @@
import pathlib import pathlib
from dynaconf import Dynaconf
import data_pipeline import data_pipeline
from dynaconf import Dynaconf
settings = Dynaconf( settings = Dynaconf(
envvar_prefix="DYNACONF", envvar_prefix="DYNACONF",

View file

@ -427,7 +427,9 @@
} }
], ],
"source": [ "source": [
"for col in [col for col in download_codebook.index.to_list() if \"(percentile)\" in col]:\n", "for col in [\n",
" col for col in download_codebook.index.to_list() if \"(percentile)\" in col\n",
"]:\n",
" print(f\" - column_name: {col}\")\n", " print(f\" - column_name: {col}\")\n",
" if \"Low\" not in col:\n", " if \"Low\" not in col:\n",
" print(\n", " print(\n",

View file

@ -1,6 +1,8 @@
from dataclasses import dataclass, field from dataclasses import dataclass
from dataclasses import field
from enum import Enum from enum import Enum
from typing import List, Optional from typing import List
from typing import Optional
class FieldType(Enum): class FieldType(Enum):

View file

@ -5,18 +5,15 @@ import typing
from typing import Optional from typing import Optional
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.score.etl_utils import ( from data_pipeline.etl.score.etl_utils import (
compare_to_list_of_expected_state_fips_codes, compare_to_list_of_expected_state_fips_codes,
) )
from data_pipeline.etl.score.schemas.datasets import DatasetsConfig from data_pipeline.etl.score.schemas.datasets import DatasetsConfig
from data_pipeline.utils import ( from data_pipeline.utils import get_module_logger
load_yaml_dict_from_file, from data_pipeline.utils import load_yaml_dict_from_file
unzip_file_from_url, from data_pipeline.utils import remove_all_from_dir
remove_all_from_dir, from data_pipeline.utils import unzip_file_from_url
get_module_logger,
)
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,5 +1,5 @@
import importlib
import concurrent.futures import concurrent.futures
import importlib
import typing import typing
from data_pipeline.etl.score.etl_score import ScoreETL from data_pipeline.etl.score.etl_score import ScoreETL

View file

@ -1,9 +1,8 @@
import datetime
import os import os
from pathlib import Path from pathlib import Path
import datetime
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.score import field_names from data_pipeline.score import field_names
## note: to keep map porting "right" fields, keeping descriptors the same. ## note: to keep map porting "right" fields, keeping descriptors the same.

View file

@ -1,31 +1,26 @@
import functools import functools
from typing import List
from dataclasses import dataclass from dataclasses import dataclass
from typing import List
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.score import constants
from data_pipeline.etl.sources.census_acs.etl import CensusACSETL from data_pipeline.etl.sources.census_acs.etl import CensusACSETL
from data_pipeline.etl.sources.national_risk_index.etl import (
NationalRiskIndexETL,
)
from data_pipeline.etl.sources.dot_travel_composite.etl import ( from data_pipeline.etl.sources.dot_travel_composite.etl import (
TravelCompositeETL, TravelCompositeETL,
) )
from data_pipeline.etl.sources.fsf_flood_risk.etl import (
FloodRiskETL,
)
from data_pipeline.etl.sources.eamlis.etl import AbandonedMineETL from data_pipeline.etl.sources.eamlis.etl import AbandonedMineETL
from data_pipeline.etl.sources.fsf_flood_risk.etl import FloodRiskETL
from data_pipeline.etl.sources.fsf_wildfire_risk.etl import WildfireRiskETL
from data_pipeline.etl.sources.national_risk_index.etl import (
NationalRiskIndexETL,
)
from data_pipeline.etl.sources.nlcd_nature_deprived.etl import NatureDeprivedETL
from data_pipeline.etl.sources.tribal_overlap.etl import TribalOverlapETL from data_pipeline.etl.sources.tribal_overlap.etl import TribalOverlapETL
from data_pipeline.etl.sources.us_army_fuds.etl import USArmyFUDS from data_pipeline.etl.sources.us_army_fuds.etl import USArmyFUDS
from data_pipeline.etl.sources.nlcd_nature_deprived.etl import NatureDeprivedETL
from data_pipeline.etl.sources.fsf_wildfire_risk.etl import WildfireRiskETL
from data_pipeline.score.score_runner import ScoreRunner
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.etl.score import constants from data_pipeline.score.score_runner import ScoreRunner
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)
@ -699,7 +694,9 @@ class ScoreETL(ExtractTransformLoad):
self.df = self._backfill_island_demographics(self.df) self.df = self._backfill_island_demographics(self.df)
def load(self) -> None: def load(self) -> None:
logger.info("Saving Score CSV") logger.info(
f"Saving Score CSV to {constants.DATA_SCORE_CSV_FULL_FILE_PATH}."
)
constants.DATA_SCORE_CSV_FULL_DIR.mkdir(parents=True, exist_ok=True) constants.DATA_SCORE_CSV_FULL_DIR.mkdir(parents=True, exist_ok=True)
self.df.to_csv(constants.DATA_SCORE_CSV_FULL_FILE_PATH, index=False) self.df.to_csv(constants.DATA_SCORE_CSV_FULL_FILE_PATH, index=False)

View file

@ -1,24 +1,20 @@
import concurrent.futures import concurrent.futures
import math import math
import os import os
import geopandas as gpd
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import geopandas as gpd from data_pipeline.content.schemas.download_schemas import CSVConfig
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.score import constants from data_pipeline.etl.score import constants
from data_pipeline.etl.sources.census.etl_utils import (
check_census_data_source,
)
from data_pipeline.etl.score.etl_utils import check_score_data_source from data_pipeline.etl.score.etl_utils import check_score_data_source
from data_pipeline.etl.sources.census.etl_utils import check_census_data_source
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.content.schemas.download_schemas import CSVConfig from data_pipeline.utils import get_module_logger
from data_pipeline.utils import ( from data_pipeline.utils import load_dict_from_yaml_object_fields
get_module_logger, from data_pipeline.utils import load_yaml_dict_from_file
zip_files, from data_pipeline.utils import zip_files
load_yaml_dict_from_file,
load_dict_from_yaml_object_fields,
)
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,29 +1,23 @@
from pathlib import Path
import json import json
from numpy import float64 from pathlib import Path
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.content.schemas.download_schemas import ( from data_pipeline.content.schemas.download_schemas import CodebookConfig
CSVConfig, from data_pipeline.content.schemas.download_schemas import CSVConfig
CodebookConfig, from data_pipeline.content.schemas.download_schemas import ExcelConfig
ExcelConfig,
)
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.score.etl_utils import floor_series, create_codebook from data_pipeline.etl.score.etl_utils import create_codebook
from data_pipeline.utils import ( from data_pipeline.etl.score.etl_utils import floor_series
get_module_logger, from data_pipeline.etl.sources.census.etl_utils import check_census_data_source
zip_files,
load_yaml_dict_from_file,
column_list_from_yaml_object_fields,
load_dict_from_yaml_object_fields,
)
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import column_list_from_yaml_object_fields
from data_pipeline.utils import get_module_logger
from data_pipeline.utils import load_dict_from_yaml_object_fields
from data_pipeline.utils import load_yaml_dict_from_file
from data_pipeline.utils import zip_files
from numpy import float64
from data_pipeline.etl.sources.census.etl_utils import (
check_census_data_source,
)
from . import constants from . import constants
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,24 +1,21 @@
import os import os
import sys import sys
import typing import typing
from pathlib import Path
from collections import namedtuple from collections import namedtuple
from pathlib import Path
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.score.constants import ( from data_pipeline.etl.score.constants import TILES_ALASKA_AND_HAWAII_FIPS_CODE
TILES_ISLAND_AREA_FIPS_CODES, from data_pipeline.etl.score.constants import TILES_CONTINENTAL_US_FIPS_CODE
TILES_PUERTO_RICO_FIPS_CODE, from data_pipeline.etl.score.constants import TILES_ISLAND_AREA_FIPS_CODES
TILES_CONTINENTAL_US_FIPS_CODE, from data_pipeline.etl.score.constants import TILES_PUERTO_RICO_FIPS_CODE
TILES_ALASKA_AND_HAWAII_FIPS_CODE,
)
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
from data_pipeline.utils import (
download_file_from_url,
get_module_logger,
)
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import download_file_from_url
from data_pipeline.utils import get_module_logger
from . import constants from . import constants
logger = get_module_logger(__name__) logger = get_module_logger(__name__)
@ -99,7 +96,7 @@ def floor_series(series: pd.Series, number_of_decimals: int) -> pd.Series:
if series.isin(unacceptable_values).any(): if series.isin(unacceptable_values).any():
series.replace(mapping, regex=False, inplace=True) series.replace(mapping, regex=False, inplace=True)
multiplication_factor = 10 ** number_of_decimals multiplication_factor = 10**number_of_decimals
# In order to safely cast NaNs # In order to safely cast NaNs
# First coerce series to float type: series.astype(float) # First coerce series to float type: series.astype(float)

View file

@ -1,6 +1,8 @@
from dataclasses import dataclass, field from dataclasses import dataclass
from dataclasses import field
from enum import Enum from enum import Enum
from typing import List, Optional from typing import List
from typing import Optional
class FieldType(Enum): class FieldType(Enum):

View file

@ -5,7 +5,8 @@ from pathlib import Path
import pandas as pd import pandas as pd
import pytest import pytest
from data_pipeline import config from data_pipeline import config
from data_pipeline.etl.score import etl_score_post, tests from data_pipeline.etl.score import etl_score_post
from data_pipeline.etl.score import tests
from data_pipeline.etl.score.etl_score_post import PostScoreETL from data_pipeline.etl.score.etl_score_post import PostScoreETL

View file

@ -1,11 +1,10 @@
import pandas as pd
import numpy as np import numpy as np
import pandas as pd
import pytest import pytest
from data_pipeline.etl.score.etl_utils import ( from data_pipeline.etl.score.etl_utils import (
floor_series,
compare_to_list_of_expected_state_fips_codes, compare_to_list_of_expected_state_fips_codes,
) )
from data_pipeline.etl.score.etl_utils import floor_series
def test_floor_series(): def test_floor_series():

View file

@ -1,14 +1,11 @@
# pylint: disable=W0212 # pylint: disable=W0212
## Above disables warning about access to underscore-prefixed methods ## Above disables warning about access to underscore-prefixed methods
from importlib import reload from importlib import reload
from pathlib import Path from pathlib import Path
import pandas.api.types as ptypes import pandas.api.types as ptypes
import pandas.testing as pdt import pandas.testing as pdt
from data_pipeline.content.schemas.download_schemas import ( from data_pipeline.content.schemas.download_schemas import CSVConfig
CSVConfig,
)
from data_pipeline.etl.score import constants from data_pipeline.etl.score import constants
from data_pipeline.utils import load_yaml_dict_from_file from data_pipeline.utils import load_yaml_dict_from_file

View file

@ -1,8 +1,7 @@
import pandas as pd import pandas as pd
from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
from data_pipeline.config import settings
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,13 +1,15 @@
import pathlib import pathlib
from pathlib import Path from pathlib import Path
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.etl.score.etl_utils import ( from data_pipeline.etl.score.etl_utils import (
compare_to_list_of_expected_state_fips_codes, compare_to_list_of_expected_state_fips_codes,
) )
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger, download_file_from_url from data_pipeline.utils import download_file_from_url
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,9 +1,11 @@
import typing import typing
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel import pandas as pd
from data_pipeline.utils import get_module_logger, download_file_from_url from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import download_file_from_url
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,9 +1,8 @@
import pandas as pd
import numpy as np import numpy as np
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -3,12 +3,12 @@ import json
import subprocess import subprocess
from enum import Enum from enum import Enum
from pathlib import Path from pathlib import Path
import geopandas as gpd import geopandas as gpd
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger, unzip_file_from_url
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
from data_pipeline.utils import get_module_logger
from data_pipeline.utils import unzip_file_from_url
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -5,13 +5,11 @@ from pathlib import Path
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.utils import ( from data_pipeline.utils import get_module_logger
get_module_logger, from data_pipeline.utils import remove_all_dirs_from_dir
remove_all_dirs_from_dir, from data_pipeline.utils import remove_files_from_dir
remove_files_from_dir, from data_pipeline.utils import unzip_file_from_url
unzip_file_from_url, from data_pipeline.utils import zip_directory
zip_directory,
)
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,19 +1,19 @@
from collections import namedtuple
import os import os
import pandas as pd from collections import namedtuple
import geopandas as gpd
import geopandas as gpd
import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.sources.census_acs.etl_utils import (
retrieve_census_acs_data,
)
from data_pipeline.etl.sources.census_acs.etl_imputations import ( from data_pipeline.etl.sources.census_acs.etl_imputations import (
calculate_income_measures, calculate_income_measures,
) )
from data_pipeline.etl.sources.census_acs.etl_utils import (
from data_pipeline.utils import get_module_logger, unzip_file_from_url retrieve_census_acs_data,
)
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
from data_pipeline.utils import unzip_file_from_url
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,10 @@
from typing import Any, List, NamedTuple, Tuple from typing import Any
import pandas as pd from typing import List
import geopandas as gpd from typing import NamedTuple
from typing import Tuple
import geopandas as gpd
import pandas as pd
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger

View file

@ -1,10 +1,9 @@
import os import os
from pathlib import Path from pathlib import Path
from typing import List from typing import List
import censusdata import censusdata
import pandas as pd import pandas as pd
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger

View file

@ -1,11 +1,10 @@
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.sources.census_acs.etl_utils import ( from data_pipeline.etl.sources.census_acs.etl_utils import (
retrieve_census_acs_data, retrieve_census_acs_data,
) )
from data_pipeline.utils import get_module_logger
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,13 +1,14 @@
import json import json
from pathlib import Path from pathlib import Path
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import requests import requests
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.utils import unzip_file_from_url, download_file_from_url from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import download_file_from_url
from data_pipeline.utils import get_module_logger
from data_pipeline.utils import unzip_file_from_url
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,14 +1,13 @@
import json import json
from typing import List from typing import List
import requests
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import requests
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger
from data_pipeline.score import field_names
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
pd.options.mode.chained_assignment = "raise" pd.options.mode.chained_assignment = "raise"

View file

@ -1,7 +1,8 @@
from pathlib import Path from pathlib import Path
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,8 +1,9 @@
from pathlib import Path from pathlib import Path
import pandas as pd
import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,10 +1,9 @@
# pylint: disable=unsubscriptable-object # pylint: disable=unsubscriptable-object
# pylint: disable=unsupported-assignment-operation # pylint: disable=unsupported-assignment-operation
import pandas as pd
import geopandas as gpd import geopandas as gpd
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,9 +1,10 @@
from pathlib import Path from pathlib import Path
import geopandas as gpd import geopandas as gpd
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger

View file

@ -1,6 +1,6 @@
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger

View file

@ -1,5 +1,4 @@
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
@ -58,7 +57,6 @@ class EJSCREENAreasOfConcernETL(ExtractTransformLoad):
# TO DO: As a one off we did all the processing in a separate Notebook # TO DO: As a one off we did all the processing in a separate Notebook
# Can add here later for a future PR # Can add here later for a future PR
pass
def load(self) -> None: def load(self) -> None:
if self.ejscreen_areas_of_concern_data_exists(): if self.ejscreen_areas_of_concern_data_exists():

View file

@ -1,10 +1,11 @@
from pathlib import Path from pathlib import Path
import pandas as pd
import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger, unzip_file_from_url from data_pipeline.utils import get_module_logger
from data_pipeline.utils import unzip_file_from_url
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,9 +1,10 @@
from pathlib import Path from pathlib import Path
import pandas as pd
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger, unzip_file_from_url from data_pipeline.utils import get_module_logger
from data_pipeline.utils import unzip_file_from_url
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,10 +1,9 @@
# pylint: disable=unsubscriptable-object # pylint: disable=unsubscriptable-object
# pylint: disable=unsupported-assignment-operation # pylint: disable=unsupported-assignment-operation
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,10 +1,9 @@
# pylint: disable=unsubscriptable-object # pylint: disable=unsubscriptable-object
# pylint: disable=unsupported-assignment-operation # pylint: disable=unsupported-assignment-operation
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,11 +1,12 @@
"""Utililities for turning geographies into tracts, using census data""" """Utililities for turning geographies into tracts, using census data"""
from functools import lru_cache
from pathlib import Path from pathlib import Path
from typing import Optional from typing import Optional
from functools import lru_cache
import geopandas as gpd import geopandas as gpd
from data_pipeline.etl.sources.tribal.etl import TribalETL from data_pipeline.etl.sources.tribal.etl import TribalETL
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
from .census.etl import CensusETL from .census.etl import CensusETL
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,11 +1,9 @@
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import ( from data_pipeline.etl.base import ValidGeoLevel
get_module_logger, from data_pipeline.utils import get_module_logger
unzip_file_from_url, from data_pipeline.utils import unzip_file_from_url
)
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,8 +1,8 @@
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
from data_pipeline.utils import get_module_logger
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,9 +1,9 @@
import pandas as pd import pandas as pd
from pandas.errors import EmptyDataError
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
from data_pipeline.utils import get_module_logger, unzip_file_from_url from data_pipeline.utils import get_module_logger
from data_pipeline.utils import unzip_file_from_url
from pandas.errors import EmptyDataError
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,5 +1,6 @@
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,9 +1,8 @@
import pandas as pd import pandas as pd
import requests import requests
from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
from data_pipeline.config import settings
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,10 +1,9 @@
import pandas as pd
import geopandas as gpd import geopandas as gpd
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger
from data_pipeline.score import field_names
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)
@ -96,4 +95,3 @@ class MappingForEJETL(ExtractTransformLoad):
def validate(self) -> None: def validate(self) -> None:
logger.info("Validating Mapping For EJ Data") logger.info("Validating Mapping For EJ Data")
pass

View file

@ -1,10 +1,11 @@
import pathlib import pathlib
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import download_file_from_url, get_module_logger from data_pipeline.utils import download_file_from_url
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,11 +1,11 @@
from glob import glob from glob import glob
import geopandas as gpd import geopandas as gpd
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger
from data_pipeline.score import field_names
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,9 +1,8 @@
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import get_module_logger
from data_pipeline.score import field_names
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -2,10 +2,9 @@
# but it may be a known bug. https://github.com/PyCQA/pylint/issues/1498 # but it may be a known bug. https://github.com/PyCQA/pylint/issues/1498
# pylint: disable=unsubscriptable-object # pylint: disable=unsubscriptable-object
# pylint: disable=unsupported-assignment-operation # pylint: disable=unsupported-assignment-operation
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,10 +1,9 @@
# pylint: disable=unsubscriptable-object # pylint: disable=unsubscriptable-object
# pylint: disable=unsupported-assignment-operation # pylint: disable=unsupported-assignment-operation
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,12 +1,11 @@
import functools import functools
import pandas as pd
import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.utils import ( from data_pipeline.etl.base import ValidGeoLevel
get_module_logger, from data_pipeline.utils import get_module_logger
unzip_file_from_url, from data_pipeline.utils import unzip_file_from_url
)
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,11 +1,12 @@
from pathlib import Path from pathlib import Path
import geopandas as gpd import geopandas as gpd
import pandas as pd import pandas as pd
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger, unzip_file_from_url from data_pipeline.utils import get_module_logger
from data_pipeline.utils import unzip_file_from_url
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,11 +1,8 @@
from pathlib import Path from pathlib import Path
from data_pipeline.utils import ( from data_pipeline.utils import get_module_logger
get_module_logger, from data_pipeline.utils import remove_all_from_dir
remove_all_from_dir, from data_pipeline.utils import remove_files_from_dir
remove_files_from_dir,
)
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,12 +1,11 @@
import geopandas as gpd import geopandas as gpd
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.sources.geo_utils import ( from data_pipeline.etl.base import ValidGeoLevel
add_tracts_for_geometries, from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries
get_tribal_geojson, from data_pipeline.etl.sources.geo_utils import get_tract_geojson
get_tract_geojson, from data_pipeline.etl.sources.geo_utils import get_tribal_geojson
)
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger

View file

@ -1,11 +1,13 @@
from pathlib import Path from pathlib import Path
import geopandas as gpd
import pandas as pd
import numpy as np
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel import geopandas as gpd
from data_pipeline.utils import get_module_logger, download_file_from_url import numpy as np
import pandas as pd
from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.base import ValidGeoLevel
from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries
from data_pipeline.utils import download_file_from_url
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -211,7 +211,9 @@
} }
], ],
"source": [ "source": [
"tmp = sns.FacetGrid(data=score_m, col=\"Urban Heuristic Flag\", col_wrap=2, height=7)\n", "tmp = sns.FacetGrid(\n",
" data=score_m, col=\"Urban Heuristic Flag\", col_wrap=2, height=7\n",
")\n",
"tmp.map(\n", "tmp.map(\n",
" sns.distplot,\n", " sns.distplot,\n",
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n", " \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
@ -250,7 +252,9 @@
")\n", ")\n",
"\n", "\n",
"nri_with_flag[\"total_ag_loss\"] = nri_with_flag.filter(like=\"EALA\").sum(axis=1)\n", "nri_with_flag[\"total_ag_loss\"] = nri_with_flag.filter(like=\"EALA\").sum(axis=1)\n",
"nri_with_flag[\"total_ag_loss_pctile\"] = nri_with_flag[\"total_ag_loss\"].rank(pct=True)\n", "nri_with_flag[\"total_ag_loss_pctile\"] = nri_with_flag[\"total_ag_loss\"].rank(\n",
" pct=True\n",
")\n",
"\n", "\n",
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"total_ag_loss_pctile\"].mean()" "nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"total_ag_loss_pctile\"].mean()"
] ]
@ -779,9 +783,9 @@
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\"\n", " \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\"\n",
"].astype(int)\n", "].astype(int)\n",
"\n", "\n",
"score_m_adjusted_tracts = set(score_m[score_m[\"adjusted\"] > 0][\"GEOID10_TRACT\"]).union(\n", "score_m_adjusted_tracts = set(\n",
" all_ag_loss_tracts\n", " score_m[score_m[\"adjusted\"] > 0][\"GEOID10_TRACT\"]\n",
")\n", ").union(all_ag_loss_tracts)\n",
"display(len(set(all_scorem_tracts).difference(score_m_adjusted_tracts)))" "display(len(set(all_scorem_tracts).difference(score_m_adjusted_tracts)))"
] ]
}, },
@ -832,7 +836,11 @@
" left_clip = nri_with_flag[nri_with_flag[\"Urban Heuristic Flag\"] == 0][\n", " left_clip = nri_with_flag[nri_with_flag[\"Urban Heuristic Flag\"] == 0][\n",
" \"AGRIVALUE\"\n", " \"AGRIVALUE\"\n",
" ].quantile(threshold)\n", " ].quantile(threshold)\n",
" print(\"At threshold {:.2f}, minimum value is ${:,.0f}\".format(threshold, left_clip))\n", " print(\n",
" \"At threshold {:.2f}, minimum value is ${:,.0f}\".format(\n",
" threshold, left_clip\n",
" )\n",
" )\n",
" tmp_value = nri_with_flag[\"AGRIVALUE\"].clip(lower=left_clip)\n", " tmp_value = nri_with_flag[\"AGRIVALUE\"].clip(lower=left_clip)\n",
" nri_with_flag[\"total_ag_loss_pctile_{:.2f}\".format(threshold)] = (\n", " nri_with_flag[\"total_ag_loss_pctile_{:.2f}\".format(threshold)] = (\n",
" nri_with_flag[\"total_ag_loss\"] / tmp_value\n", " nri_with_flag[\"total_ag_loss\"] / tmp_value\n",
@ -889,7 +897,9 @@
" .set_index(\"Left clip value\")[[\"Rural\", \"Urban\"]]\n", " .set_index(\"Left clip value\")[[\"Rural\", \"Urban\"]]\n",
" .stack()\n", " .stack()\n",
" .reset_index()\n", " .reset_index()\n",
" .rename(columns={\"level_1\": \"Tract classification\", 0: \"Average percentile\"})\n", " .rename(\n",
" columns={\"level_1\": \"Tract classification\", 0: \"Average percentile\"}\n",
" )\n",
")" ")"
] ]
}, },

View file

@ -21,6 +21,7 @@
"source": [ "source": [
"import os\n", "import os\n",
"import sys\n", "import sys\n",
"\n",
"module_path = os.path.abspath(os.path.join(\"../..\"))\n", "module_path = os.path.abspath(os.path.join(\"../..\"))\n",
"if module_path not in sys.path:\n", "if module_path not in sys.path:\n",
" sys.path.append(module_path)" " sys.path.append(module_path)"
@ -94,9 +95,13 @@
"bia_aian_supplemental_geojson = (\n", "bia_aian_supplemental_geojson = (\n",
" GEOJSON_BASE_PATH / \"bia_national_lar\" / \"BIA_AIAN_Supplemental.json\"\n", " GEOJSON_BASE_PATH / \"bia_national_lar\" / \"BIA_AIAN_Supplemental.json\"\n",
")\n", ")\n",
"bia_tsa_geojson_geojson = GEOJSON_BASE_PATH / \"bia_national_lar\" / \"BIA_TSA.json\"\n", "bia_tsa_geojson_geojson = (\n",
" GEOJSON_BASE_PATH / \"bia_national_lar\" / \"BIA_TSA.json\"\n",
")\n",
"alaska_native_villages_geojson = (\n", "alaska_native_villages_geojson = (\n",
" GEOJSON_BASE_PATH / \"alaska_native_villages\" / \"AlaskaNativeVillages.gdb.geojson\"\n", " GEOJSON_BASE_PATH\n",
" / \"alaska_native_villages\"\n",
" / \"AlaskaNativeVillages.gdb.geojson\"\n",
")" ")"
] ]
}, },
@ -131,7 +136,9 @@
"len(\n", "len(\n",
" sorted(\n", " sorted(\n",
" list(\n", " list(\n",
" bia_national_lar_df.LARName.str.replace(r\"\\(.*\\) \", \"\", regex=True).unique()\n", " bia_national_lar_df.LARName.str.replace(\n",
" r\"\\(.*\\) \", \"\", regex=True\n",
" ).unique()\n",
" )\n", " )\n",
" )\n", " )\n",
")" ")"

View file

@ -45,6 +45,7 @@
"source": [ "source": [
"# Read in the score geojson file\n", "# Read in the score geojson file\n",
"from data_pipeline.etl.score.constants import DATA_SCORE_CSV_TILES_FILE_PATH\n", "from data_pipeline.etl.score.constants import DATA_SCORE_CSV_TILES_FILE_PATH\n",
"\n",
"nation = gpd.read_file(DATA_SCORE_CSV_TILES_FILE_PATH)" "nation = gpd.read_file(DATA_SCORE_CSV_TILES_FILE_PATH)"
] ]
}, },
@ -93,10 +94,14 @@
" random_tile_features = json.loads(f.read())\n", " random_tile_features = json.loads(f.read())\n",
"\n", "\n",
"# Flatten data around the features key:\n", "# Flatten data around the features key:\n",
"flatten_features = pd.json_normalize(random_tile_features, record_path=[\"features\"])\n", "flatten_features = pd.json_normalize(\n",
" random_tile_features, record_path=[\"features\"]\n",
")\n",
"\n", "\n",
"# index into the feature properties, get keys and turn into a sorted list\n", "# index into the feature properties, get keys and turn into a sorted list\n",
"random_tile = sorted(list(flatten_features[\"features\"][0][0][\"properties\"].keys()))" "random_tile = sorted(\n",
" list(flatten_features[\"features\"][0][0][\"properties\"].keys())\n",
")"
] ]
}, },
{ {
@ -291,8 +296,8 @@
} }
], ],
"source": [ "source": [
"nation_HRS_GEO = nation[['GEOID10', 'SF', 'CF', 'HRS_ET', 'AML_ET', 'FUDS_ET']]\n", "nation_HRS_GEO = nation[[\"GEOID10\", \"SF\", \"CF\", \"HRS_ET\", \"AML_ET\", \"FUDS_ET\"]]\n",
"nation_HRS_GEO.loc[nation_HRS_GEO['FUDS_ET'] == '0']" "nation_HRS_GEO.loc[nation_HRS_GEO[\"FUDS_ET\"] == \"0\"]"
] ]
}, },
{ {
@ -321,7 +326,7 @@
} }
], ],
"source": [ "source": [
"nation['HRS_ET'].unique()" "nation[\"HRS_ET\"].unique()"
] ]
} }
], ],

View file

@ -1,9 +1,6 @@
#!/usr/bin/env python #!/usr/bin/env python
# coding: utf-8 # coding: utf-8
# In[ ]: # In[ ]:
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from sklearn.preprocessing import MinMaxScaler from sklearn.preprocessing import MinMaxScaler

View file

@ -18,7 +18,10 @@
" sys.path.append(module_path)\n", " sys.path.append(module_path)\n",
"\n", "\n",
"from data_pipeline.config import settings\n", "from data_pipeline.config import settings\n",
"from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries, get_tract_geojson\n" "from data_pipeline.etl.sources.geo_utils import (\n",
" add_tracts_for_geometries,\n",
" get_tract_geojson,\n",
")"
] ]
}, },
{ {
@ -655,9 +658,9 @@
} }
], ],
"source": [ "source": [
"adjacent_tracts.groupby(\"ORIGINAL_TRACT\")[[\"included\"]].mean().reset_index().rename(\n", "adjacent_tracts.groupby(\"ORIGINAL_TRACT\")[\n",
" columns={\"ORIGINAL_TRACT\": \"GEOID10_TRACT\"}\n", " [\"included\"]\n",
")" "].mean().reset_index().rename(columns={\"ORIGINAL_TRACT\": \"GEOID10_TRACT\"})"
] ]
}, },
{ {

View file

@ -65,7 +65,8 @@
"tmp_path.mkdir(parents=True, exist_ok=True)\n", "tmp_path.mkdir(parents=True, exist_ok=True)\n",
"\n", "\n",
"eamlis_path_in_s3 = (\n", "eamlis_path_in_s3 = (\n",
" settings.AWS_JUSTICE40_DATASOURCES_URL + \"/eAMLIS export of all data.tsv.zip\"\n", " settings.AWS_JUSTICE40_DATASOURCES_URL\n",
" + \"/eAMLIS export of all data.tsv.zip\"\n",
")\n", ")\n",
"\n", "\n",
"unzip_file_from_url(\n", "unzip_file_from_url(\n",

View file

@ -460,7 +460,9 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"object_ids_to_keep = set(\n", "object_ids_to_keep = set(\n",
" merged_exaple_data[merged_exaple_data[\"_merge\"] == \"both\"].OBJECTID.astype(\"int\")\n", " merged_exaple_data[merged_exaple_data[\"_merge\"] == \"both\"].OBJECTID.astype(\n",
" \"int\"\n",
" )\n",
")\n", ")\n",
"features = []\n", "features = []\n",
"for feature in raw_fuds_geojson[\"features\"]:\n", "for feature in raw_fuds_geojson[\"features\"]:\n",
@ -476,7 +478,11 @@
"outputs": [], "outputs": [],
"source": [ "source": [
"def make_fake_feature(\n", "def make_fake_feature(\n",
" state: str, has_projects: bool, is_eligible: bool, latitude: float, longitude: float\n", " state: str,\n",
" has_projects: bool,\n",
" is_eligible: bool,\n",
" latitude: float,\n",
" longitude: float,\n",
"):\n", "):\n",
" \"\"\"For tracts where we don't have a FUDS, fake one.\"\"\"\n", " \"\"\"For tracts where we don't have a FUDS, fake one.\"\"\"\n",
" make_fake_feature._object_id += 1\n", " make_fake_feature._object_id += 1\n",
@ -537,7 +543,9 @@
"# Create FUDS in CA for each tract that doesn't have a FUDS\n", "# Create FUDS in CA for each tract that doesn't have a FUDS\n",
"for tract_id, point in points.items():\n", "for tract_id, point in points.items():\n",
" for bools in [(True, True), (True, False), (False, False)]:\n", " for bools in [(True, True), (True, False), (False, False)]:\n",
" features.append(make_fake_feature(\"CA\", bools[0], bools[1], point.y, point.x))" " features.append(\n",
" make_fake_feature(\"CA\", bools[0], bools[1], point.y, point.x)\n",
" )"
] ]
}, },
{ {
@ -596,9 +604,9 @@
} }
], ],
"source": [ "source": [
"test_frame_with_tracts_full = test_frame_with_tracts = add_tracts_for_geometries(\n", "test_frame_with_tracts_full = (\n",
" test_frame\n", " test_frame_with_tracts\n",
")" ") = add_tracts_for_geometries(test_frame)"
] ]
}, },
{ {
@ -680,7 +688,9 @@
} }
], ],
"source": [ "source": [
"tracts = test_frame_with_tracts_full[[\"GEOID10_TRACT\", \"geometry\"]].drop_duplicates()\n", "tracts = test_frame_with_tracts_full[\n",
" [\"GEOID10_TRACT\", \"geometry\"]\n",
"].drop_duplicates()\n",
"tracts[\"lat_long\"] = test_frame_with_tracts_full.geometry.apply(\n", "tracts[\"lat_long\"] = test_frame_with_tracts_full.geometry.apply(\n",
" lambda point: (point.x, point.y)\n", " lambda point: (point.x, point.y)\n",
")\n", ")\n",

View file

@ -13,7 +13,7 @@
"import geopandas as gpd\n", "import geopandas as gpd\n",
"\n", "\n",
"# Read in the above json file\n", "# Read in the above json file\n",
"nation=gpd.read_file(\"/Users/vims/Downloads/usa-high-1822-637b.json\")" "nation = gpd.read_file(\"/Users/vims/Downloads/usa-high-1822-637b.json\")"
] ]
}, },
{ {
@ -45,7 +45,7 @@
} }
], ],
"source": [ "source": [
"nation['FUDS_RAW']" "nation[\"FUDS_RAW\"]"
] ]
}, },
{ {
@ -248,7 +248,18 @@
} }
], ],
"source": [ "source": [
"nation_new_ind = nation[['GEOID10', 'SF', 'CF', 'HRS_ET', 'AML_ET', 'AML_RAW','FUDS_ET', 'FUDS_RAW']]\n", "nation_new_ind = nation[\n",
" [\n",
" \"GEOID10\",\n",
" \"SF\",\n",
" \"CF\",\n",
" \"HRS_ET\",\n",
" \"AML_ET\",\n",
" \"AML_RAW\",\n",
" \"FUDS_ET\",\n",
" \"FUDS_RAW\",\n",
" ]\n",
"]\n",
"nation_new_ind" "nation_new_ind"
] ]
}, },
@ -270,7 +281,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['HRS_ET'].unique()" "nation_new_ind[\"HRS_ET\"].unique()"
] ]
}, },
{ {
@ -293,7 +304,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['HRS_ET'].value_counts()" "nation_new_ind[\"HRS_ET\"].value_counts()"
] ]
}, },
{ {
@ -314,7 +325,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['AML_ET'].unique()" "nation_new_ind[\"AML_ET\"].unique()"
] ]
}, },
{ {
@ -337,7 +348,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['AML_ET'].value_counts()" "nation_new_ind[\"AML_ET\"].value_counts()"
] ]
}, },
{ {
@ -358,7 +369,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['AML_RAW'].unique()" "nation_new_ind[\"AML_RAW\"].unique()"
] ]
}, },
{ {
@ -380,7 +391,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['AML_RAW'].value_counts()" "nation_new_ind[\"AML_RAW\"].value_counts()"
] ]
}, },
{ {
@ -401,7 +412,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['FUDS_ET'].unique()" "nation_new_ind[\"FUDS_ET\"].unique()"
] ]
}, },
{ {
@ -424,7 +435,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['FUDS_ET'].value_counts()" "nation_new_ind[\"FUDS_ET\"].value_counts()"
] ]
}, },
{ {
@ -445,7 +456,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['FUDS_RAW'].unique()" "nation_new_ind[\"FUDS_RAW\"].unique()"
] ]
}, },
{ {
@ -468,7 +479,7 @@
} }
], ],
"source": [ "source": [
"nation_new_ind['FUDS_RAW'].value_counts()" "nation_new_ind[\"FUDS_RAW\"].value_counts()"
] ]
} }
], ],

View file

@ -36,8 +36,8 @@
" engine=\"pyogrio\",\n", " engine=\"pyogrio\",\n",
")\n", ")\n",
"end = time.time()\n", "end = time.time()\n",
" \n", "\n",
"print(\"Time taken to execute the function using pyogrio is\", end-begin)" "print(\"Time taken to execute the function using pyogrio is\", end - begin)"
] ]
}, },
{ {
@ -59,11 +59,13 @@
"census_tract_gdf = gpd.read_file(\n", "census_tract_gdf = gpd.read_file(\n",
" CensusETL.NATIONAL_TRACT_JSON_PATH,\n", " CensusETL.NATIONAL_TRACT_JSON_PATH,\n",
" engine=\"fiona\",\n", " engine=\"fiona\",\n",
" include_fields=[\"GEOID10\"]\n", " include_fields=[\"GEOID10\"],\n",
")\n", ")\n",
"end2 = time.time()\n", "end2 = time.time()\n",
" \n", "\n",
"print(\"Time taken to execute the function using include fields is\", end2-begin2)" "print(\n",
" \"Time taken to execute the function using include fields is\", end2 - begin2\n",
")"
] ]
}, },
{ {

View file

@ -1369,7 +1369,9 @@
"\n", "\n",
"results = results.reset_index()\n", "results = results.reset_index()\n",
"\n", "\n",
"results.to_csv(\"~/Downloads/tribal_area_as_a_share_of_tract_area.csv\", index=False)\n", "results.to_csv(\n",
" \"~/Downloads/tribal_area_as_a_share_of_tract_area.csv\", index=False\n",
")\n",
"\n", "\n",
"# Printing results\n", "# Printing results\n",
"print(results)" "print(results)"

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,8 +1,8 @@
from collections import namedtuple from collections import namedtuple
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,7 +1,6 @@
import pandas as pd
from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import pandas as pd
from data_pipeline.score.score import Score
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,8 +1,7 @@
import data_pipeline.score.field_names as field_names
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.score.score import Score from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,11 +1,11 @@
from typing import Tuple from typing import Tuple
import data_pipeline.etl.score.constants as constants
import data_pipeline.score.field_names as field_names
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.score.score import Score from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
import data_pipeline.etl.score.constants as constants
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,12 +1,12 @@
from typing import Tuple from typing import Tuple
import data_pipeline.etl.score.constants as constants
import data_pipeline.score.field_names as field_names
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from data_pipeline.score.score import Score from data_pipeline.score.score import Score
import data_pipeline.score.field_names as field_names
from data_pipeline.utils import get_module_logger
import data_pipeline.etl.score.constants as constants
from data_pipeline.score.utils import calculate_tract_adjacency_scores from data_pipeline.score.utils import calculate_tract_adjacency_scores
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,6 +1,5 @@
import pandas as pd import pandas as pd
from data_pipeline.score.score_narwhal import ScoreNarwhal from data_pipeline.score.score_narwhal import ScoreNarwhal
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,12 +1,12 @@
"""Utilities to help generate the score.""" """Utilities to help generate the score."""
import pandas as pd
import geopandas as gpd
import data_pipeline.score.field_names as field_names import data_pipeline.score.field_names as field_names
import geopandas as gpd
import pandas as pd
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
from data_pipeline.utils import get_module_logger
# XXX: @jorge I am torn about the coupling that importing from # XXX: @jorge I am torn about the coupling that importing from
# etl.sources vs keeping the code DRY. Thoughts? # etl.sources vs keeping the code DRY. Thoughts?
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -3,7 +3,6 @@ from pathlib import Path
from shutil import copyfile from shutil import copyfile
import pytest import pytest
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad

View file

@ -1,8 +1,8 @@
import pandas as pd import pandas as pd
import pytest import pytest
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.score.field_names import GEOID_TRACT_FIELD
from data_pipeline.etl.score import constants from data_pipeline.etl.score import constants
from data_pipeline.score.field_names import GEOID_TRACT_FIELD
@pytest.fixture(scope="session") @pytest.fixture(scope="session")

View file

@ -1,9 +1,11 @@
# flake8: noqa: W0613,W0611,F811 # flake8: noqa: W0613,W0611,F811
from dataclasses import dataclass from dataclasses import dataclass
import pytest import pytest
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger
from data_pipeline.score.score_narwhal import ScoreNarwhal from data_pipeline.score.score_narwhal import ScoreNarwhal
from data_pipeline.utils import get_module_logger
from .fixtures import final_score_df # pylint: disable=unused-import from .fixtures import final_score_df # pylint: disable=unused-import
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -2,36 +2,35 @@
# pylint: disable=unused-import,too-many-arguments # pylint: disable=unused-import,too-many-arguments
from dataclasses import dataclass from dataclasses import dataclass
from typing import List from typing import List
import pytest
import pandas as pd
import numpy as np import numpy as np
import pandas as pd
import pytest
from data_pipeline.etl.score import constants from data_pipeline.etl.score import constants
from data_pipeline.etl.score.constants import TILES_ISLAND_AREA_FIPS_CODES
from data_pipeline.score import field_names from data_pipeline.score import field_names
from data_pipeline.score.field_names import GEOID_TRACT_FIELD from data_pipeline.score.field_names import GEOID_TRACT_FIELD
from data_pipeline.etl.score.constants import TILES_ISLAND_AREA_FIPS_CODES
from .fixtures import (
final_score_df,
ejscreen_df,
hud_housing_df,
census_acs_df,
cdc_places_df,
census_acs_median_incomes_df,
cdc_life_expectancy_df,
doe_energy_burden_df,
national_risk_index_df,
dot_travel_disadvantage_df,
fsf_fire_df,
nature_deprived_df,
eamlis_df,
fuds_df,
geocorr_urban_rural_df,
census_decennial_df,
census_2010_df,
hrs_df,
national_tract_df,
tribal_overlap,
)
from .fixtures import cdc_life_expectancy_df # noqa
from .fixtures import cdc_places_df # noqa
from .fixtures import census_2010_df # noqa
from .fixtures import census_acs_df # noqa
from .fixtures import census_acs_median_incomes_df # noqa
from .fixtures import census_decennial_df # noqa
from .fixtures import doe_energy_burden_df # noqa
from .fixtures import dot_travel_disadvantage_df # noqa
from .fixtures import eamlis_df # noqa
from .fixtures import ejscreen_df # noqa
from .fixtures import final_score_df # noqa
from .fixtures import fsf_fire_df # noqa
from .fixtures import fuds_df # noqa
from .fixtures import geocorr_urban_rural_df # noqa
from .fixtures import hrs_df # noqa
from .fixtures import hud_housing_df # noqa
from .fixtures import national_risk_index_df # noqa
from .fixtures import national_tract_df # noqa
from .fixtures import nature_deprived_df # noqa
from .fixtures import tribal_overlap # noqa
pytestmark = pytest.mark.smoketest pytestmark = pytest.mark.smoketest
UNMATCHED_TRACT_THRESHOLD = 1000 UNMATCHED_TRACT_THRESHOLD = 1000

View file

@ -1,10 +1,9 @@
# pylint: disable=protected-access # pylint: disable=protected-access
import pandas as pd import pandas as pd
import pytest import pytest
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.score import field_names
from data_pipeline.etl.score.etl_score import ScoreETL from data_pipeline.etl.score.etl_score import ScoreETL
from data_pipeline.score import field_names
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger
logger = get_module_logger(__name__) logger = get_module_logger(__name__)

View file

@ -1,18 +1,20 @@
# flake8: noqa: W0613,W0611,F811 # flake8: noqa: W0613,W0611,F811
from dataclasses import dataclass from dataclasses import dataclass
from typing import Optional from typing import Optional
import pandas as pd
import geopandas as gpd import geopandas as gpd
import numpy as np import numpy as np
import pandas as pd
import pytest import pytest
from data_pipeline.config import settings from data_pipeline.config import settings
from data_pipeline.etl.score import constants from data_pipeline.etl.score import constants
from data_pipeline.score import field_names from data_pipeline.etl.score.constants import THRESHOLD_COUNT_TO_SHOW_FIELD_NAME
from data_pipeline.etl.score.constants import TILES_SCORE_COLUMNS
from data_pipeline.etl.score.constants import ( from data_pipeline.etl.score.constants import (
TILES_SCORE_COLUMNS,
THRESHOLD_COUNT_TO_SHOW_FIELD_NAME,
USER_INTERFACE_EXPERIENCE_FIELD_NAME, USER_INTERFACE_EXPERIENCE_FIELD_NAME,
) )
from data_pipeline.score import field_names
from .fixtures import final_score_df # pylint: disable=unused-import from .fixtures import final_score_df # pylint: disable=unused-import
pytestmark = pytest.mark.smoketest pytestmark = pytest.mark.smoketest

View file

@ -1,17 +1,17 @@
# pylint: disable=protected-access # pylint: disable=protected-access
# flake8: noqa=F841 # flake8: noqa=F841
from contextlib import contextmanager
from functools import partial
from pathlib import Path from pathlib import Path
from unittest import mock from unittest import mock
from functools import partial
from contextlib import contextmanager
import pytest
import pandas as pd import pandas as pd
import pytest
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
from data_pipeline.score import field_names
from data_pipeline.score.utils import ( from data_pipeline.score.utils import (
calculate_tract_adjacency_scores as original_calculate_tract_adjacency_score, calculate_tract_adjacency_scores as original_calculate_tract_adjacency_score,
) )
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
from data_pipeline.score import field_names
@contextmanager @contextmanager

View file

@ -1,6 +1,7 @@
# pylint: disable=protected-access # pylint: disable=protected-access
import pathlib import pathlib
from unittest import mock from unittest import mock
import requests import requests
from data_pipeline.etl.base import ExtractTransformLoad from data_pipeline.etl.base import ExtractTransformLoad
from data_pipeline.etl.sources.cdc_life_expectancy.etl import CDCLifeExpectancy from data_pipeline.etl.sources.cdc_life_expectancy.etl import CDCLifeExpectancy

View file

@ -1,6 +1,7 @@
import pathlib import pathlib
from data_pipeline.tests.sources.example.test_etl import TestETL
from data_pipeline.etl.sources.cdc_places.etl import CDCPlacesETL from data_pipeline.etl.sources.cdc_places.etl import CDCPlacesETL
from data_pipeline.tests.sources.example.test_etl import TestETL
class TestCDCPlacesETL(TestETL): class TestCDCPlacesETL(TestETL):

View file

@ -1,9 +1,7 @@
# pylint: disable=protected-access # pylint: disable=protected-access
import pathlib import pathlib
from data_pipeline.etl.sources.doe_energy_burden.etl import ( from data_pipeline.etl.sources.doe_energy_burden.etl import DOEEnergyBurden
DOEEnergyBurden,
)
from data_pipeline.tests.sources.example.test_etl import TestETL from data_pipeline.tests.sources.example.test_etl import TestETL
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger

View file

@ -1,9 +1,10 @@
import pathlib import pathlib
import geopandas as gpd import geopandas as gpd
from data_pipeline.tests.sources.example.test_etl import TestETL
from data_pipeline.etl.sources.dot_travel_composite.etl import ( from data_pipeline.etl.sources.dot_travel_composite.etl import (
TravelCompositeETL, TravelCompositeETL,
) )
from data_pipeline.tests.sources.example.test_etl import TestETL
class TestTravelCompositeETL(TestETL): class TestTravelCompositeETL(TestETL):

View file

@ -1,11 +1,9 @@
# pylint: disable=protected-access # pylint: disable=protected-access
from unittest import mock
import pathlib import pathlib
from data_pipeline.etl.base import ValidGeoLevel from unittest import mock
from data_pipeline.etl.sources.eamlis.etl import ( from data_pipeline.etl.base import ValidGeoLevel
AbandonedMineETL, from data_pipeline.etl.sources.eamlis.etl import AbandonedMineETL
)
from data_pipeline.tests.sources.example.test_etl import TestETL from data_pipeline.tests.sources.example.test_etl import TestETL
from data_pipeline.utils import get_module_logger from data_pipeline.utils import get_module_logger

View file

@ -1,6 +1,7 @@
import pathlib import pathlib
from data_pipeline.tests.sources.example.test_etl import TestETL
from data_pipeline.etl.sources.ejscreen.etl import EJSCREENETL from data_pipeline.etl.sources.ejscreen.etl import EJSCREENETL
from data_pipeline.tests.sources.example.test_etl import TestETL
class TestEJSCREENETL(TestETL): class TestEJSCREENETL(TestETL):

Some files were not shown because too many files have changed in this diff Show more