mirror of
https://github.com/DOI-DO/j40-cejst-2.git
synced 2025-07-23 08:40:17 -07:00
Issue 105: Configure and run black
and other pre-commit hooks (clean branch) (#1962)
* Configure and run `black` and other pre-commit hooks Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>
This commit is contained in:
parent
baa34ec038
commit
6e6223cd5e
162 changed files with 716 additions and 602 deletions
1
.github/CODEOWNERS
vendored
1
.github/CODEOWNERS
vendored
|
@ -1,2 +1 @@
|
|||
* @esfoobar-usds @vim-usds @emma-nechamkin @mattbowen-usds
|
||||
|
||||
|
|
|
@ -13,4 +13,3 @@ Los mantenedores del proyecto tienen el derecho y la obligación de eliminar, ed
|
|||
Los casos de abuso, acoso o de otro comportamiento inaceptable se pueden denunciar abriendo un problema o contactando con uno o más de los mantenedores del proyecto en justice40open@usds.gov.
|
||||
|
||||
Este Código de conducta es una adaptación de la versión 1.0.0 del Convenio del colaborador ([Contributor Covenant](http://contributor-covenant.org), *en inglés*) disponible en el sitio http://contributor-covenant.org/version/1/0/0/ *(en inglés)*.
|
||||
|
||||
|
|
|
@ -43,4 +43,3 @@ Si desea colaborar con alguna parte del código base, bifurque el repositorio si
|
|||
* Al menos un revisor autorizado debe aprobar la confirmación (en [CODEOWNERS](https://github.com/usds/justice40-tool/tree/main/.github/CODEOWNERS), en inglés, consulte la lista más reciente de estos revisores).
|
||||
* Todas las verificaciones de estado obligatorias deben ser aprobadas.
|
||||
Si hay un desacuerdo importante entre los integrantes del equipo, se organizará una reunión con el fin de determinar el plan de acción para la solicitud de incorporación de cambios.
|
||||
|
||||
|
|
|
@ -28,4 +28,3 @@ Por estos u otros propósitos y motivos, y sin ninguna expectativa de otra consi
|
|||
c. El Afirmante excluye la responsabilidad de los derechos de compensación de otras personas que se puedan aplicar a la Obra o a cualquier uso de esta, incluidos, entre otros, los Derechos de Autor y Derechos Conexos de cualquier persona sobre la Obra. Además, el Afirmante excluye la responsabilidad de obtener los consentimientos o permisos u otros derechos necesarios que se exijan para cualquier uso de la Obra.
|
||||
|
||||
d. El Afirmante entiende y reconoce que Creative Commons no es una parte en este documento y que no tiene ningún derecho u obligación con respecto a esta CC0 o al uso de la Obra.
|
||||
|
||||
|
|
39
data/data-pipeline/.pre-commit-config.yaml
Normal file
39
data/data-pipeline/.pre-commit-config.yaml
Normal file
|
@ -0,0 +1,39 @@
|
|||
exclude: ^client|\.csv
|
||||
repos:
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
rev: v4.3.0
|
||||
hooks:
|
||||
- id: end-of-file-fixer
|
||||
- id: trailing-whitespace
|
||||
|
||||
- repo: https://github.com/lucasmbrown/mirrors-autoflake
|
||||
rev: v1.3
|
||||
hooks:
|
||||
- id: autoflake
|
||||
args:
|
||||
[
|
||||
"--in-place",
|
||||
"--remove-all-unused-imports",
|
||||
"--remove-unused-variable",
|
||||
"--ignore-init-module-imports",
|
||||
]
|
||||
|
||||
- repo: https://github.com/pycqa/isort
|
||||
rev: 5.10.1
|
||||
hooks:
|
||||
- id: isort
|
||||
name: isort (python)
|
||||
args:
|
||||
[
|
||||
"--force-single-line-imports",
|
||||
"--profile=black",
|
||||
"--line-length=80",
|
||||
"--src-path=.:data/data-pipeline"
|
||||
]
|
||||
|
||||
- repo: https://github.com/ambv/black
|
||||
rev: 22.8.0
|
||||
hooks:
|
||||
- id: black
|
||||
language_version: python3.9
|
||||
args: [--config=./data/data-pipeline/pyproject.toml]
|
|
@ -250,7 +250,59 @@ as `poetry run tox`.
|
|||
|
||||
Each run can take a while to build the whole environment. If you'd like to save time,
|
||||
you can use the previously built environment by running `poetry run tox -e lint`
|
||||
which will drastically speed up the process.
|
||||
which will drastically speed up the linting process.
|
||||
|
||||
### Configuring pre-commit hooks
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
To promote consistent code style and quality, we use git pre-commit hooks to
|
||||
automatically lint and reformat our code before every commit we make to the codebase.
|
||||
Pre-commit hooks are defined in the file [`.pre-commit-config.yaml`](../.pre-commit-config.yaml).
|
||||
<!-- markdown-link-check-enable -->
|
||||
|
||||
1. First, install [`pre-commit`](https://pre-commit.com/) globally:
|
||||
|
||||
$ brew install pre-commit
|
||||
|
||||
2. While in the `data/data-pipeline` directory, run `pre-commit install` to install
|
||||
the specific git hooks used in this repository.
|
||||
|
||||
Now, any time you commit code to the repository, the hooks will run on all modified files automatically. If you wish,
|
||||
you can force a re-run on all files with `pre-commit run --all-files`.
|
||||
|
||||
#### Conflicts between backend and frontend git hooks
|
||||
<!-- markdown-link-check-disable -->
|
||||
In the front-end part of the codebase (the `justice40-tool/client` folder), we use
|
||||
`Husky` to run pre-commit hooks for the front-end. This is different than the
|
||||
`pre-commit` framework we use for the backend. The frontend `Husky` hooks are
|
||||
configured at
|
||||
[client/.husky](client/.husky).
|
||||
|
||||
It is not possible to run both our `Husky` hooks and `pre-commit` hooks on every
|
||||
commit; either one or the other will run.
|
||||
|
||||
<!-- markdown-link-check-enable -->
|
||||
|
||||
`Husky` is installed every time you run `npm install`. To use the `Husky` front-end
|
||||
hooks during front-end development, simply run `npm install`.
|
||||
|
||||
However, running `npm install` overwrites the backend hooks setup by `pre-commit`.
|
||||
To restore the backend hooks after running `npm install`, do the following:
|
||||
|
||||
1. Run `pre-commit install` while in the `data/data-pipeline` directory.
|
||||
2. The terminal should respond with an error message such as:
|
||||
```
|
||||
[ERROR] Cowardly refusing to install hooks with `core.hooksPath` set.
|
||||
hint: `git config --unset-all core.hooksPath`
|
||||
```
|
||||
|
||||
This error is caused by having previously run `npm install` which used `Husky` to
|
||||
overwrite the hooks path.
|
||||
|
||||
3. Follow the hint and run `git config --unset-all core.hooksPath`.
|
||||
4. Run `pre-commit install` again.
|
||||
|
||||
Now `pre-commit` and the backend hooks should take precedence.
|
||||
|
||||
### The Application entrypoint
|
||||
|
||||
|
|
|
@ -1,31 +1,27 @@
|
|||
from subprocess import call
|
||||
import sys
|
||||
import click
|
||||
from subprocess import call
|
||||
|
||||
import click
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.runner import (
|
||||
etl_runner,
|
||||
score_generate,
|
||||
score_geo,
|
||||
score_post,
|
||||
)
|
||||
from data_pipeline.etl.runner import etl_runner
|
||||
from data_pipeline.etl.runner import score_generate
|
||||
from data_pipeline.etl.runner import score_geo
|
||||
from data_pipeline.etl.runner import score_post
|
||||
from data_pipeline.etl.sources.census.etl_utils import check_census_data_source
|
||||
from data_pipeline.etl.sources.census.etl_utils import (
|
||||
check_census_data_source,
|
||||
reset_data_directories as census_reset,
|
||||
zip_census_data,
|
||||
)
|
||||
from data_pipeline.etl.sources.census.etl_utils import zip_census_data
|
||||
from data_pipeline.etl.sources.tribal.etl_utils import (
|
||||
reset_data_directories as tribal_reset,
|
||||
)
|
||||
from data_pipeline.tile.generate import generate_tiles
|
||||
from data_pipeline.utils import (
|
||||
data_folder_cleanup,
|
||||
get_module_logger,
|
||||
score_folder_cleanup,
|
||||
downloadable_cleanup,
|
||||
temp_folder_cleanup,
|
||||
check_first_run,
|
||||
)
|
||||
from data_pipeline.utils import check_first_run
|
||||
from data_pipeline.utils import data_folder_cleanup
|
||||
from data_pipeline.utils import downloadable_cleanup
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import score_folder_cleanup
|
||||
from data_pipeline.utils import temp_folder_cleanup
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
@ -36,8 +32,6 @@ dataset_cli_help = "Grab the data from either 'local' for local access or 'aws'
|
|||
def cli():
|
||||
"""Defines a click group for the commands below"""
|
||||
|
||||
pass
|
||||
|
||||
|
||||
@cli.command(help="Clean up all census data folders")
|
||||
def census_cleanup():
|
||||
|
|
|
@ -12,12 +12,12 @@ To see more: https://buildmedia.readthedocs.org/media/pdf/papermill/latest/paper
|
|||
To run:
|
||||
` $ python src/run_tract_comparison.py --template_notebook=TEMPLATE.ipynb --parameter_yaml=PARAMETERS.yaml`
|
||||
"""
|
||||
|
||||
import os
|
||||
import datetime
|
||||
import argparse
|
||||
import yaml
|
||||
import datetime
|
||||
import os
|
||||
|
||||
import papermill as pm
|
||||
import yaml
|
||||
|
||||
|
||||
def _read_param_file(param_file: str) -> dict:
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"from data_pipeline.score import field_names\n",
|
||||
"from data_pipeline.comparison_tool.src import utils \n",
|
||||
"from data_pipeline.comparison_tool.src import utils\n",
|
||||
"\n",
|
||||
"pd.options.display.float_format = \"{:,.3f}\".format\n",
|
||||
"%load_ext lab_black"
|
||||
|
@ -128,9 +128,7 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"utils.validate_new_data(\n",
|
||||
" file_path=COMPARATOR_FILE, score_col=COMPARATOR_COLUMN\n",
|
||||
")"
|
||||
"utils.validate_new_data(file_path=COMPARATOR_FILE, score_col=COMPARATOR_COLUMN)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -148,20 +146,25 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"comparator_cols = [COMPARATOR_COLUMN] + OTHER_COMPARATOR_COLUMNS if OTHER_COMPARATOR_COLUMNS else [COMPARATOR_COLUMN]\n",
|
||||
"comparator_cols = (\n",
|
||||
" [COMPARATOR_COLUMN] + OTHER_COMPARATOR_COLUMNS\n",
|
||||
" if OTHER_COMPARATOR_COLUMNS\n",
|
||||
" else [COMPARATOR_COLUMN]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"#papermill_description=Loading_data\n",
|
||||
"# papermill_description=Loading_data\n",
|
||||
"joined_df = pd.concat(\n",
|
||||
" [\n",
|
||||
" utils.read_file(\n",
|
||||
" file_path=SCORE_FILE,\n",
|
||||
" columns=[TOTAL_POPULATION_COLUMN, SCORE_COLUMN] + ADDITIONAL_DEMO_COLUMNS,\n",
|
||||
" columns=[TOTAL_POPULATION_COLUMN, SCORE_COLUMN]\n",
|
||||
" + ADDITIONAL_DEMO_COLUMNS,\n",
|
||||
" geoid=GEOID_COLUMN,\n",
|
||||
" ),\n",
|
||||
" utils.read_file(\n",
|
||||
" file_path=COMPARATOR_FILE,\n",
|
||||
" columns=comparator_cols,\n",
|
||||
" geoid=GEOID_COLUMN\n",
|
||||
" geoid=GEOID_COLUMN,\n",
|
||||
" ),\n",
|
||||
" utils.read_file(\n",
|
||||
" file_path=DEMOGRAPHIC_FILE,\n",
|
||||
|
@ -196,13 +199,13 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#papermill_description=Summary_stats\n",
|
||||
"# papermill_description=Summary_stats\n",
|
||||
"population_df = utils.produce_summary_stats(\n",
|
||||
" joined_df=joined_df,\n",
|
||||
" comparator_column=COMPARATOR_COLUMN,\n",
|
||||
" score_column=SCORE_COLUMN,\n",
|
||||
" population_column=TOTAL_POPULATION_COLUMN,\n",
|
||||
" geoid_column=GEOID_COLUMN\n",
|
||||
" geoid_column=GEOID_COLUMN,\n",
|
||||
")\n",
|
||||
"population_df"
|
||||
]
|
||||
|
@ -224,18 +227,18 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#papermill_description=Tract_stats\n",
|
||||
"# papermill_description=Tract_stats\n",
|
||||
"tract_level_by_identification_df = pd.concat(\n",
|
||||
" [\n",
|
||||
" utils.get_demo_series(\n",
|
||||
" grouping_column=COMPARATOR_COLUMN,\n",
|
||||
" joined_df=joined_df,\n",
|
||||
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS\n",
|
||||
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS,\n",
|
||||
" ),\n",
|
||||
" utils.get_demo_series(\n",
|
||||
" grouping_column=SCORE_COLUMN,\n",
|
||||
" joined_df=joined_df,\n",
|
||||
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS\n",
|
||||
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS,\n",
|
||||
" ),\n",
|
||||
" ],\n",
|
||||
" axis=1,\n",
|
||||
|
@ -256,17 +259,25 @@
|
|||
" y=\"Variable\",\n",
|
||||
" x=\"Avg in tracts\",\n",
|
||||
" hue=\"Definition\",\n",
|
||||
" data=tract_level_by_identification_df.sort_values(by=COMPARATOR_COLUMN, ascending=False)\n",
|
||||
" data=tract_level_by_identification_df.sort_values(\n",
|
||||
" by=COMPARATOR_COLUMN, ascending=False\n",
|
||||
" )\n",
|
||||
" .stack()\n",
|
||||
" .reset_index()\n",
|
||||
" .rename(\n",
|
||||
" columns={\"level_0\": \"Variable\", \"level_1\": \"Definition\", 0: \"Avg in tracts\"}\n",
|
||||
" columns={\n",
|
||||
" \"level_0\": \"Variable\",\n",
|
||||
" \"level_1\": \"Definition\",\n",
|
||||
" 0: \"Avg in tracts\",\n",
|
||||
" }\n",
|
||||
" ),\n",
|
||||
" palette=\"Blues\",\n",
|
||||
")\n",
|
||||
"plt.xlim(0, 1)\n",
|
||||
"plt.title(\"Tract level averages by identification strategy\")\n",
|
||||
"plt.savefig(os.path.join(OUTPUT_DATA_PATH, \"tract_lvl_avg.jpg\"), bbox_inches='tight')"
|
||||
"plt.savefig(\n",
|
||||
" os.path.join(OUTPUT_DATA_PATH, \"tract_lvl_avg.jpg\"), bbox_inches=\"tight\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -276,13 +287,13 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#papermill_description=Tract_stats_grouped\n",
|
||||
"# papermill_description=Tract_stats_grouped\n",
|
||||
"tract_level_by_grouping_df = utils.get_tract_level_grouping(\n",
|
||||
" joined_df=joined_df,\n",
|
||||
" score_column=SCORE_COLUMN,\n",
|
||||
" comparator_column=COMPARATOR_COLUMN,\n",
|
||||
" demo_columns=ADDITIONAL_DEMO_COLUMNS + DEMOGRAPHIC_COLUMNS,\n",
|
||||
" keep_missing_values=KEEP_MISSING_VALUES_FOR_SEGMENTATION\n",
|
||||
" keep_missing_values=KEEP_MISSING_VALUES_FOR_SEGMENTATION,\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"tract_level_by_grouping_formatted_df = utils.format_multi_index_for_excel(\n",
|
||||
|
@ -315,7 +326,7 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#papermill_description=Population_stats\n",
|
||||
"# papermill_description=Population_stats\n",
|
||||
"population_weighted_stats_df = pd.concat(\n",
|
||||
" [\n",
|
||||
" utils.construct_weighted_statistics(\n",
|
||||
|
@ -363,7 +374,7 @@
|
|||
"comparator_and_cejst_proportion_series, states = utils.get_final_summary_info(\n",
|
||||
" population=population_df,\n",
|
||||
" comparator_file=COMPARATOR_FILE,\n",
|
||||
" geoid_col=GEOID_COLUMN\n",
|
||||
" geoid_col=GEOID_COLUMN,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
@ -393,7 +404,7 @@
|
|||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#papermill_description=Writing_excel\n",
|
||||
"# papermill_description=Writing_excel\n",
|
||||
"utils.write_single_comparison_excel(\n",
|
||||
" output_excel=OUTPUT_EXCEL,\n",
|
||||
" population_df=population_df,\n",
|
||||
|
@ -401,7 +412,7 @@
|
|||
" population_weighted_stats_df=population_weighted_stats_df,\n",
|
||||
" tract_level_by_grouping_formatted_df=tract_level_by_grouping_formatted_df,\n",
|
||||
" comparator_and_cejst_proportion_series=comparator_and_cejst_proportion_series,\n",
|
||||
" states_text=states_text\n",
|
||||
" states_text=states_text,\n",
|
||||
")"
|
||||
]
|
||||
}
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
import pathlib
|
||||
|
||||
import pandas as pd
|
||||
import xlsxwriter
|
||||
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.etl.sources.census.etl_utils import get_state_information
|
||||
from data_pipeline.score import field_names
|
||||
|
||||
# Some excel parameters
|
||||
DEFAULT_COLUMN_WIDTH = 18
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
import pathlib
|
||||
|
||||
from dynaconf import Dynaconf
|
||||
|
||||
import data_pipeline
|
||||
from dynaconf import Dynaconf
|
||||
|
||||
settings = Dynaconf(
|
||||
envvar_prefix="DYNACONF",
|
||||
|
|
|
@ -427,7 +427,9 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"for col in [col for col in download_codebook.index.to_list() if \"(percentile)\" in col]:\n",
|
||||
"for col in [\n",
|
||||
" col for col in download_codebook.index.to_list() if \"(percentile)\" in col\n",
|
||||
"]:\n",
|
||||
" print(f\" - column_name: {col}\")\n",
|
||||
" if \"Low\" not in col:\n",
|
||||
" print(\n",
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
from dataclasses import dataclass, field
|
||||
from dataclasses import dataclass
|
||||
from dataclasses import field
|
||||
from enum import Enum
|
||||
from typing import List, Optional
|
||||
from typing import List
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class FieldType(Enum):
|
||||
|
|
|
@ -5,18 +5,15 @@ import typing
|
|||
from typing import Optional
|
||||
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.score.etl_utils import (
|
||||
compare_to_list_of_expected_state_fips_codes,
|
||||
)
|
||||
from data_pipeline.etl.score.schemas.datasets import DatasetsConfig
|
||||
from data_pipeline.utils import (
|
||||
load_yaml_dict_from_file,
|
||||
unzip_file_from_url,
|
||||
remove_all_from_dir,
|
||||
get_module_logger,
|
||||
)
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import load_yaml_dict_from_file
|
||||
from data_pipeline.utils import remove_all_from_dir
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
import importlib
|
||||
import concurrent.futures
|
||||
import importlib
|
||||
import typing
|
||||
|
||||
from data_pipeline.etl.score.etl_score import ScoreETL
|
||||
|
|
|
@ -1,9 +1,8 @@
|
|||
import datetime
|
||||
import os
|
||||
from pathlib import Path
|
||||
import datetime
|
||||
|
||||
from data_pipeline.config import settings
|
||||
|
||||
from data_pipeline.score import field_names
|
||||
|
||||
## note: to keep map porting "right" fields, keeping descriptors the same.
|
||||
|
|
|
@ -1,31 +1,26 @@
|
|||
import functools
|
||||
from typing import List
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import List
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.score import constants
|
||||
from data_pipeline.etl.sources.census_acs.etl import CensusACSETL
|
||||
from data_pipeline.etl.sources.national_risk_index.etl import (
|
||||
NationalRiskIndexETL,
|
||||
)
|
||||
from data_pipeline.etl.sources.dot_travel_composite.etl import (
|
||||
TravelCompositeETL,
|
||||
)
|
||||
from data_pipeline.etl.sources.fsf_flood_risk.etl import (
|
||||
FloodRiskETL,
|
||||
)
|
||||
from data_pipeline.etl.sources.eamlis.etl import AbandonedMineETL
|
||||
from data_pipeline.etl.sources.fsf_flood_risk.etl import FloodRiskETL
|
||||
from data_pipeline.etl.sources.fsf_wildfire_risk.etl import WildfireRiskETL
|
||||
from data_pipeline.etl.sources.national_risk_index.etl import (
|
||||
NationalRiskIndexETL,
|
||||
)
|
||||
from data_pipeline.etl.sources.nlcd_nature_deprived.etl import NatureDeprivedETL
|
||||
from data_pipeline.etl.sources.tribal_overlap.etl import TribalOverlapETL
|
||||
from data_pipeline.etl.sources.us_army_fuds.etl import USArmyFUDS
|
||||
from data_pipeline.etl.sources.nlcd_nature_deprived.etl import NatureDeprivedETL
|
||||
from data_pipeline.etl.sources.fsf_wildfire_risk.etl import WildfireRiskETL
|
||||
from data_pipeline.score.score_runner import ScoreRunner
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.etl.score import constants
|
||||
|
||||
from data_pipeline.score.score_runner import ScoreRunner
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
@ -699,7 +694,9 @@ class ScoreETL(ExtractTransformLoad):
|
|||
self.df = self._backfill_island_demographics(self.df)
|
||||
|
||||
def load(self) -> None:
|
||||
logger.info("Saving Score CSV")
|
||||
logger.info(
|
||||
f"Saving Score CSV to {constants.DATA_SCORE_CSV_FULL_FILE_PATH}."
|
||||
)
|
||||
constants.DATA_SCORE_CSV_FULL_DIR.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
self.df.to_csv(constants.DATA_SCORE_CSV_FULL_FILE_PATH, index=False)
|
||||
|
|
|
@ -1,24 +1,20 @@
|
|||
import concurrent.futures
|
||||
import math
|
||||
import os
|
||||
|
||||
import geopandas as gpd
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import geopandas as gpd
|
||||
|
||||
from data_pipeline.content.schemas.download_schemas import CSVConfig
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.score import constants
|
||||
from data_pipeline.etl.sources.census.etl_utils import (
|
||||
check_census_data_source,
|
||||
)
|
||||
from data_pipeline.etl.score.etl_utils import check_score_data_source
|
||||
from data_pipeline.etl.sources.census.etl_utils import check_census_data_source
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.content.schemas.download_schemas import CSVConfig
|
||||
from data_pipeline.utils import (
|
||||
get_module_logger,
|
||||
zip_files,
|
||||
load_yaml_dict_from_file,
|
||||
load_dict_from_yaml_object_fields,
|
||||
)
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import load_dict_from_yaml_object_fields
|
||||
from data_pipeline.utils import load_yaml_dict_from_file
|
||||
from data_pipeline.utils import zip_files
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,29 +1,23 @@
|
|||
from pathlib import Path
|
||||
import json
|
||||
from numpy import float64
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from data_pipeline.content.schemas.download_schemas import (
|
||||
CSVConfig,
|
||||
CodebookConfig,
|
||||
ExcelConfig,
|
||||
)
|
||||
|
||||
from data_pipeline.content.schemas.download_schemas import CodebookConfig
|
||||
from data_pipeline.content.schemas.download_schemas import CSVConfig
|
||||
from data_pipeline.content.schemas.download_schemas import ExcelConfig
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.score.etl_utils import floor_series, create_codebook
|
||||
from data_pipeline.utils import (
|
||||
get_module_logger,
|
||||
zip_files,
|
||||
load_yaml_dict_from_file,
|
||||
column_list_from_yaml_object_fields,
|
||||
load_dict_from_yaml_object_fields,
|
||||
)
|
||||
from data_pipeline.etl.score.etl_utils import create_codebook
|
||||
from data_pipeline.etl.score.etl_utils import floor_series
|
||||
from data_pipeline.etl.sources.census.etl_utils import check_census_data_source
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import column_list_from_yaml_object_fields
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import load_dict_from_yaml_object_fields
|
||||
from data_pipeline.utils import load_yaml_dict_from_file
|
||||
from data_pipeline.utils import zip_files
|
||||
from numpy import float64
|
||||
|
||||
|
||||
from data_pipeline.etl.sources.census.etl_utils import (
|
||||
check_census_data_source,
|
||||
)
|
||||
from . import constants
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,24 +1,21 @@
|
|||
import os
|
||||
import sys
|
||||
import typing
|
||||
from pathlib import Path
|
||||
from collections import namedtuple
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.score.constants import (
|
||||
TILES_ISLAND_AREA_FIPS_CODES,
|
||||
TILES_PUERTO_RICO_FIPS_CODE,
|
||||
TILES_CONTINENTAL_US_FIPS_CODE,
|
||||
TILES_ALASKA_AND_HAWAII_FIPS_CODE,
|
||||
)
|
||||
from data_pipeline.etl.score.constants import TILES_ALASKA_AND_HAWAII_FIPS_CODE
|
||||
from data_pipeline.etl.score.constants import TILES_CONTINENTAL_US_FIPS_CODE
|
||||
from data_pipeline.etl.score.constants import TILES_ISLAND_AREA_FIPS_CODES
|
||||
from data_pipeline.etl.score.constants import TILES_PUERTO_RICO_FIPS_CODE
|
||||
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
|
||||
from data_pipeline.utils import (
|
||||
download_file_from_url,
|
||||
get_module_logger,
|
||||
)
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import download_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
from . import constants
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
@ -99,7 +96,7 @@ def floor_series(series: pd.Series, number_of_decimals: int) -> pd.Series:
|
|||
if series.isin(unacceptable_values).any():
|
||||
series.replace(mapping, regex=False, inplace=True)
|
||||
|
||||
multiplication_factor = 10 ** number_of_decimals
|
||||
multiplication_factor = 10**number_of_decimals
|
||||
|
||||
# In order to safely cast NaNs
|
||||
# First coerce series to float type: series.astype(float)
|
||||
|
|
|
@ -1,6 +1,8 @@
|
|||
from dataclasses import dataclass, field
|
||||
from dataclasses import dataclass
|
||||
from dataclasses import field
|
||||
from enum import Enum
|
||||
from typing import List, Optional
|
||||
from typing import List
|
||||
from typing import Optional
|
||||
|
||||
|
||||
class FieldType(Enum):
|
||||
|
|
|
@ -5,7 +5,8 @@ from pathlib import Path
|
|||
import pandas as pd
|
||||
import pytest
|
||||
from data_pipeline import config
|
||||
from data_pipeline.etl.score import etl_score_post, tests
|
||||
from data_pipeline.etl.score import etl_score_post
|
||||
from data_pipeline.etl.score import tests
|
||||
from data_pipeline.etl.score.etl_score_post import PostScoreETL
|
||||
|
||||
|
||||
|
|
|
@ -1,11 +1,10 @@
|
|||
import pandas as pd
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import pytest
|
||||
|
||||
from data_pipeline.etl.score.etl_utils import (
|
||||
floor_series,
|
||||
compare_to_list_of_expected_state_fips_codes,
|
||||
)
|
||||
from data_pipeline.etl.score.etl_utils import floor_series
|
||||
|
||||
|
||||
def test_floor_series():
|
||||
|
|
|
@ -1,14 +1,11 @@
|
|||
# pylint: disable=W0212
|
||||
## Above disables warning about access to underscore-prefixed methods
|
||||
|
||||
from importlib import reload
|
||||
from pathlib import Path
|
||||
|
||||
import pandas.api.types as ptypes
|
||||
import pandas.testing as pdt
|
||||
from data_pipeline.content.schemas.download_schemas import (
|
||||
CSVConfig,
|
||||
)
|
||||
|
||||
from data_pipeline.content.schemas.download_schemas import CSVConfig
|
||||
from data_pipeline.etl.score import constants
|
||||
from data_pipeline.utils import load_yaml_dict_from_file
|
||||
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.config import settings
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,13 +1,15 @@
|
|||
import pathlib
|
||||
from pathlib import Path
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.etl.score.etl_utils import (
|
||||
compare_to_list_of_expected_state_fips_codes,
|
||||
)
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger, download_file_from_url
|
||||
from data_pipeline.utils import download_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,9 +1,11 @@
|
|||
import typing
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger, download_file_from_url
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import download_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,9 +1,8 @@
|
|||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -3,12 +3,12 @@ import json
|
|||
import subprocess
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
|
||||
import geopandas as gpd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger, unzip_file_from_url
|
||||
|
||||
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -5,13 +5,11 @@ from pathlib import Path
|
|||
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.utils import (
|
||||
get_module_logger,
|
||||
remove_all_dirs_from_dir,
|
||||
remove_files_from_dir,
|
||||
unzip_file_from_url,
|
||||
zip_directory,
|
||||
)
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import remove_all_dirs_from_dir
|
||||
from data_pipeline.utils import remove_files_from_dir
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
from data_pipeline.utils import zip_directory
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,19 +1,19 @@
|
|||
from collections import namedtuple
|
||||
import os
|
||||
import pandas as pd
|
||||
import geopandas as gpd
|
||||
from collections import namedtuple
|
||||
|
||||
import geopandas as gpd
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.sources.census_acs.etl_utils import (
|
||||
retrieve_census_acs_data,
|
||||
)
|
||||
from data_pipeline.etl.sources.census_acs.etl_imputations import (
|
||||
calculate_income_measures,
|
||||
)
|
||||
|
||||
from data_pipeline.utils import get_module_logger, unzip_file_from_url
|
||||
from data_pipeline.etl.sources.census_acs.etl_utils import (
|
||||
retrieve_census_acs_data,
|
||||
)
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,7 +1,10 @@
|
|||
from typing import Any, List, NamedTuple, Tuple
|
||||
import pandas as pd
|
||||
import geopandas as gpd
|
||||
from typing import Any
|
||||
from typing import List
|
||||
from typing import NamedTuple
|
||||
from typing import Tuple
|
||||
|
||||
import geopandas as gpd
|
||||
import pandas as pd
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
|
|
@ -1,10 +1,9 @@
|
|||
import os
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
import censusdata
|
||||
import pandas as pd
|
||||
|
||||
|
||||
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
|
|
@ -1,11 +1,10 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.sources.census_acs.etl_utils import (
|
||||
retrieve_census_acs_data,
|
||||
)
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,13 +1,14 @@
|
|||
import json
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import requests
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.utils import unzip_file_from_url, download_file_from_url
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import download_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,14 +1,13 @@
|
|||
import json
|
||||
from typing import List
|
||||
import requests
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.score import field_names
|
||||
import requests
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
pd.options.mode.chained_assignment = "raise"
|
||||
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
from pathlib import Path
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,8 +1,9 @@
|
|||
from pathlib import Path
|
||||
import pandas as pd
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,10 +1,9 @@
|
|||
# pylint: disable=unsubscriptable-object
|
||||
# pylint: disable=unsupported-assignment-operation
|
||||
|
||||
import pandas as pd
|
||||
import geopandas as gpd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,9 +1,10 @@
|
|||
from pathlib import Path
|
||||
|
||||
import geopandas as gpd
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
|
|
@ -1,5 +1,4 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
@ -58,7 +57,6 @@ class EJSCREENAreasOfConcernETL(ExtractTransformLoad):
|
|||
|
||||
# TO DO: As a one off we did all the processing in a separate Notebook
|
||||
# Can add here later for a future PR
|
||||
pass
|
||||
|
||||
def load(self) -> None:
|
||||
if self.ejscreen_areas_of_concern_data_exists():
|
||||
|
|
|
@ -1,10 +1,11 @@
|
|||
from pathlib import Path
|
||||
import pandas as pd
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger, unzip_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,9 +1,10 @@
|
|||
from pathlib import Path
|
||||
import pandas as pd
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger, unzip_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,10 +1,9 @@
|
|||
# pylint: disable=unsubscriptable-object
|
||||
# pylint: disable=unsupported-assignment-operation
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,10 +1,9 @@
|
|||
# pylint: disable=unsubscriptable-object
|
||||
# pylint: disable=unsupported-assignment-operation
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,11 +1,12 @@
|
|||
"""Utililities for turning geographies into tracts, using census data"""
|
||||
|
||||
from functools import lru_cache
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from functools import lru_cache
|
||||
|
||||
import geopandas as gpd
|
||||
from data_pipeline.etl.sources.tribal.etl import TribalETL
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
from .census.etl import CensusETL
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,11 +1,9 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.utils import (
|
||||
get_module_logger,
|
||||
unzip_file_from_url,
|
||||
)
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
import pandas as pd
|
||||
from pandas.errors import EmptyDataError
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.sources.census.etl_utils import get_state_fips_codes
|
||||
from data_pipeline.utils import get_module_logger, unzip_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
from pandas.errors import EmptyDataError
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,5 +1,6 @@
|
|||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,9 +1,8 @@
|
|||
import pandas as pd
|
||||
import requests
|
||||
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.config import settings
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,10 +1,9 @@
|
|||
import pandas as pd
|
||||
import geopandas as gpd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.score import field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
@ -96,4 +95,3 @@ class MappingForEJETL(ExtractTransformLoad):
|
|||
|
||||
def validate(self) -> None:
|
||||
logger.info("Validating Mapping For EJ Data")
|
||||
pass
|
||||
|
|
|
@ -1,10 +1,11 @@
|
|||
import pathlib
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import download_file_from_url, get_module_logger
|
||||
from data_pipeline.utils import download_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,11 +1,11 @@
|
|||
from glob import glob
|
||||
|
||||
import geopandas as gpd
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,9 +1,8 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -2,10 +2,9 @@
|
|||
# but it may be a known bug. https://github.com/PyCQA/pylint/issues/1498
|
||||
# pylint: disable=unsubscriptable-object
|
||||
# pylint: disable=unsupported-assignment-operation
|
||||
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,10 +1,9 @@
|
|||
# pylint: disable=unsubscriptable-object
|
||||
# pylint: disable=unsupported-assignment-operation
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,12 +1,11 @@
|
|||
import functools
|
||||
import pandas as pd
|
||||
|
||||
import pandas as pd
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.utils import (
|
||||
get_module_logger,
|
||||
unzip_file_from_url,
|
||||
)
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,11 +1,12 @@
|
|||
from pathlib import Path
|
||||
|
||||
import geopandas as gpd
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger, unzip_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import unzip_file_from_url
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,11 +1,8 @@
|
|||
from pathlib import Path
|
||||
|
||||
from data_pipeline.utils import (
|
||||
get_module_logger,
|
||||
remove_all_from_dir,
|
||||
remove_files_from_dir,
|
||||
)
|
||||
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.utils import remove_all_from_dir
|
||||
from data_pipeline.utils import remove_files_from_dir
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,12 +1,11 @@
|
|||
import geopandas as gpd
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.etl.sources.geo_utils import (
|
||||
add_tracts_for_geometries,
|
||||
get_tribal_geojson,
|
||||
get_tract_geojson,
|
||||
)
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries
|
||||
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
|
||||
from data_pipeline.etl.sources.geo_utils import get_tribal_geojson
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
|
|
@ -1,11 +1,13 @@
|
|||
from pathlib import Path
|
||||
import geopandas as gpd
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
|
||||
from data_pipeline.etl.base import ExtractTransformLoad, ValidGeoLevel
|
||||
from data_pipeline.utils import get_module_logger, download_file_from_url
|
||||
import geopandas as gpd
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries
|
||||
from data_pipeline.utils import download_file_from_url
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -211,7 +211,9 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"tmp = sns.FacetGrid(data=score_m, col=\"Urban Heuristic Flag\", col_wrap=2, height=7)\n",
|
||||
"tmp = sns.FacetGrid(\n",
|
||||
" data=score_m, col=\"Urban Heuristic Flag\", col_wrap=2, height=7\n",
|
||||
")\n",
|
||||
"tmp.map(\n",
|
||||
" sns.distplot,\n",
|
||||
" \"Expected agricultural loss rate (Natural Hazards Risk Index) (percentile)\",\n",
|
||||
|
@ -250,7 +252,9 @@
|
|||
")\n",
|
||||
"\n",
|
||||
"nri_with_flag[\"total_ag_loss\"] = nri_with_flag.filter(like=\"EALA\").sum(axis=1)\n",
|
||||
"nri_with_flag[\"total_ag_loss_pctile\"] = nri_with_flag[\"total_ag_loss\"].rank(pct=True)\n",
|
||||
"nri_with_flag[\"total_ag_loss_pctile\"] = nri_with_flag[\"total_ag_loss\"].rank(\n",
|
||||
" pct=True\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"nri_with_flag.groupby(\"Urban Heuristic Flag\")[\"total_ag_loss_pctile\"].mean()"
|
||||
]
|
||||
|
@ -779,9 +783,9 @@
|
|||
" \"Greater than or equal to the 90th percentile for expected agriculture loss rate, is low income, and has a low percent of higher ed students?\"\n",
|
||||
"].astype(int)\n",
|
||||
"\n",
|
||||
"score_m_adjusted_tracts = set(score_m[score_m[\"adjusted\"] > 0][\"GEOID10_TRACT\"]).union(\n",
|
||||
" all_ag_loss_tracts\n",
|
||||
")\n",
|
||||
"score_m_adjusted_tracts = set(\n",
|
||||
" score_m[score_m[\"adjusted\"] > 0][\"GEOID10_TRACT\"]\n",
|
||||
").union(all_ag_loss_tracts)\n",
|
||||
"display(len(set(all_scorem_tracts).difference(score_m_adjusted_tracts)))"
|
||||
]
|
||||
},
|
||||
|
@ -832,7 +836,11 @@
|
|||
" left_clip = nri_with_flag[nri_with_flag[\"Urban Heuristic Flag\"] == 0][\n",
|
||||
" \"AGRIVALUE\"\n",
|
||||
" ].quantile(threshold)\n",
|
||||
" print(\"At threshold {:.2f}, minimum value is ${:,.0f}\".format(threshold, left_clip))\n",
|
||||
" print(\n",
|
||||
" \"At threshold {:.2f}, minimum value is ${:,.0f}\".format(\n",
|
||||
" threshold, left_clip\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
" tmp_value = nri_with_flag[\"AGRIVALUE\"].clip(lower=left_clip)\n",
|
||||
" nri_with_flag[\"total_ag_loss_pctile_{:.2f}\".format(threshold)] = (\n",
|
||||
" nri_with_flag[\"total_ag_loss\"] / tmp_value\n",
|
||||
|
@ -889,7 +897,9 @@
|
|||
" .set_index(\"Left clip value\")[[\"Rural\", \"Urban\"]]\n",
|
||||
" .stack()\n",
|
||||
" .reset_index()\n",
|
||||
" .rename(columns={\"level_1\": \"Tract classification\", 0: \"Average percentile\"})\n",
|
||||
" .rename(\n",
|
||||
" columns={\"level_1\": \"Tract classification\", 0: \"Average percentile\"}\n",
|
||||
" )\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
|
|
@ -21,6 +21,7 @@
|
|||
"source": [
|
||||
"import os\n",
|
||||
"import sys\n",
|
||||
"\n",
|
||||
"module_path = os.path.abspath(os.path.join(\"../..\"))\n",
|
||||
"if module_path not in sys.path:\n",
|
||||
" sys.path.append(module_path)"
|
||||
|
@ -94,9 +95,13 @@
|
|||
"bia_aian_supplemental_geojson = (\n",
|
||||
" GEOJSON_BASE_PATH / \"bia_national_lar\" / \"BIA_AIAN_Supplemental.json\"\n",
|
||||
")\n",
|
||||
"bia_tsa_geojson_geojson = GEOJSON_BASE_PATH / \"bia_national_lar\" / \"BIA_TSA.json\"\n",
|
||||
"bia_tsa_geojson_geojson = (\n",
|
||||
" GEOJSON_BASE_PATH / \"bia_national_lar\" / \"BIA_TSA.json\"\n",
|
||||
")\n",
|
||||
"alaska_native_villages_geojson = (\n",
|
||||
" GEOJSON_BASE_PATH / \"alaska_native_villages\" / \"AlaskaNativeVillages.gdb.geojson\"\n",
|
||||
" GEOJSON_BASE_PATH\n",
|
||||
" / \"alaska_native_villages\"\n",
|
||||
" / \"AlaskaNativeVillages.gdb.geojson\"\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
|
@ -131,7 +136,9 @@
|
|||
"len(\n",
|
||||
" sorted(\n",
|
||||
" list(\n",
|
||||
" bia_national_lar_df.LARName.str.replace(r\"\\(.*\\) \", \"\", regex=True).unique()\n",
|
||||
" bia_national_lar_df.LARName.str.replace(\n",
|
||||
" r\"\\(.*\\) \", \"\", regex=True\n",
|
||||
" ).unique()\n",
|
||||
" )\n",
|
||||
" )\n",
|
||||
")"
|
||||
|
|
|
@ -45,6 +45,7 @@
|
|||
"source": [
|
||||
"# Read in the score geojson file\n",
|
||||
"from data_pipeline.etl.score.constants import DATA_SCORE_CSV_TILES_FILE_PATH\n",
|
||||
"\n",
|
||||
"nation = gpd.read_file(DATA_SCORE_CSV_TILES_FILE_PATH)"
|
||||
]
|
||||
},
|
||||
|
@ -93,10 +94,14 @@
|
|||
" random_tile_features = json.loads(f.read())\n",
|
||||
"\n",
|
||||
"# Flatten data around the features key:\n",
|
||||
"flatten_features = pd.json_normalize(random_tile_features, record_path=[\"features\"])\n",
|
||||
"flatten_features = pd.json_normalize(\n",
|
||||
" random_tile_features, record_path=[\"features\"]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# index into the feature properties, get keys and turn into a sorted list\n",
|
||||
"random_tile = sorted(list(flatten_features[\"features\"][0][0][\"properties\"].keys()))"
|
||||
"random_tile = sorted(\n",
|
||||
" list(flatten_features[\"features\"][0][0][\"properties\"].keys())\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -291,8 +296,8 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_HRS_GEO = nation[['GEOID10', 'SF', 'CF', 'HRS_ET', 'AML_ET', 'FUDS_ET']]\n",
|
||||
"nation_HRS_GEO.loc[nation_HRS_GEO['FUDS_ET'] == '0']"
|
||||
"nation_HRS_GEO = nation[[\"GEOID10\", \"SF\", \"CF\", \"HRS_ET\", \"AML_ET\", \"FUDS_ET\"]]\n",
|
||||
"nation_HRS_GEO.loc[nation_HRS_GEO[\"FUDS_ET\"] == \"0\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -321,7 +326,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation['HRS_ET'].unique()"
|
||||
"nation[\"HRS_ET\"].unique()"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
|
|
@ -1,9 +1,6 @@
|
|||
#!/usr/bin/env python
|
||||
# coding: utf-8
|
||||
|
||||
# In[ ]:
|
||||
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
from sklearn.preprocessing import MinMaxScaler
|
||||
|
|
|
@ -18,7 +18,10 @@
|
|||
" sys.path.append(module_path)\n",
|
||||
"\n",
|
||||
"from data_pipeline.config import settings\n",
|
||||
"from data_pipeline.etl.sources.geo_utils import add_tracts_for_geometries, get_tract_geojson\n"
|
||||
"from data_pipeline.etl.sources.geo_utils import (\n",
|
||||
" add_tracts_for_geometries,\n",
|
||||
" get_tract_geojson,\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -655,9 +658,9 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"adjacent_tracts.groupby(\"ORIGINAL_TRACT\")[[\"included\"]].mean().reset_index().rename(\n",
|
||||
" columns={\"ORIGINAL_TRACT\": \"GEOID10_TRACT\"}\n",
|
||||
")"
|
||||
"adjacent_tracts.groupby(\"ORIGINAL_TRACT\")[\n",
|
||||
" [\"included\"]\n",
|
||||
"].mean().reset_index().rename(columns={\"ORIGINAL_TRACT\": \"GEOID10_TRACT\"})"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -65,7 +65,8 @@
|
|||
"tmp_path.mkdir(parents=True, exist_ok=True)\n",
|
||||
"\n",
|
||||
"eamlis_path_in_s3 = (\n",
|
||||
" settings.AWS_JUSTICE40_DATASOURCES_URL + \"/eAMLIS export of all data.tsv.zip\"\n",
|
||||
" settings.AWS_JUSTICE40_DATASOURCES_URL\n",
|
||||
" + \"/eAMLIS export of all data.tsv.zip\"\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"unzip_file_from_url(\n",
|
||||
|
|
|
@ -460,7 +460,9 @@
|
|||
"outputs": [],
|
||||
"source": [
|
||||
"object_ids_to_keep = set(\n",
|
||||
" merged_exaple_data[merged_exaple_data[\"_merge\"] == \"both\"].OBJECTID.astype(\"int\")\n",
|
||||
" merged_exaple_data[merged_exaple_data[\"_merge\"] == \"both\"].OBJECTID.astype(\n",
|
||||
" \"int\"\n",
|
||||
" )\n",
|
||||
")\n",
|
||||
"features = []\n",
|
||||
"for feature in raw_fuds_geojson[\"features\"]:\n",
|
||||
|
@ -476,7 +478,11 @@
|
|||
"outputs": [],
|
||||
"source": [
|
||||
"def make_fake_feature(\n",
|
||||
" state: str, has_projects: bool, is_eligible: bool, latitude: float, longitude: float\n",
|
||||
" state: str,\n",
|
||||
" has_projects: bool,\n",
|
||||
" is_eligible: bool,\n",
|
||||
" latitude: float,\n",
|
||||
" longitude: float,\n",
|
||||
"):\n",
|
||||
" \"\"\"For tracts where we don't have a FUDS, fake one.\"\"\"\n",
|
||||
" make_fake_feature._object_id += 1\n",
|
||||
|
@ -537,7 +543,9 @@
|
|||
"# Create FUDS in CA for each tract that doesn't have a FUDS\n",
|
||||
"for tract_id, point in points.items():\n",
|
||||
" for bools in [(True, True), (True, False), (False, False)]:\n",
|
||||
" features.append(make_fake_feature(\"CA\", bools[0], bools[1], point.y, point.x))"
|
||||
" features.append(\n",
|
||||
" make_fake_feature(\"CA\", bools[0], bools[1], point.y, point.x)\n",
|
||||
" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -596,9 +604,9 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"test_frame_with_tracts_full = test_frame_with_tracts = add_tracts_for_geometries(\n",
|
||||
" test_frame\n",
|
||||
")"
|
||||
"test_frame_with_tracts_full = (\n",
|
||||
" test_frame_with_tracts\n",
|
||||
") = add_tracts_for_geometries(test_frame)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -680,7 +688,9 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"tracts = test_frame_with_tracts_full[[\"GEOID10_TRACT\", \"geometry\"]].drop_duplicates()\n",
|
||||
"tracts = test_frame_with_tracts_full[\n",
|
||||
" [\"GEOID10_TRACT\", \"geometry\"]\n",
|
||||
"].drop_duplicates()\n",
|
||||
"tracts[\"lat_long\"] = test_frame_with_tracts_full.geometry.apply(\n",
|
||||
" lambda point: (point.x, point.y)\n",
|
||||
")\n",
|
||||
|
|
|
@ -13,7 +13,7 @@
|
|||
"import geopandas as gpd\n",
|
||||
"\n",
|
||||
"# Read in the above json file\n",
|
||||
"nation=gpd.read_file(\"/Users/vims/Downloads/usa-high-1822-637b.json\")"
|
||||
"nation = gpd.read_file(\"/Users/vims/Downloads/usa-high-1822-637b.json\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -45,7 +45,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation['FUDS_RAW']"
|
||||
"nation[\"FUDS_RAW\"]"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -248,7 +248,18 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind = nation[['GEOID10', 'SF', 'CF', 'HRS_ET', 'AML_ET', 'AML_RAW','FUDS_ET', 'FUDS_RAW']]\n",
|
||||
"nation_new_ind = nation[\n",
|
||||
" [\n",
|
||||
" \"GEOID10\",\n",
|
||||
" \"SF\",\n",
|
||||
" \"CF\",\n",
|
||||
" \"HRS_ET\",\n",
|
||||
" \"AML_ET\",\n",
|
||||
" \"AML_RAW\",\n",
|
||||
" \"FUDS_ET\",\n",
|
||||
" \"FUDS_RAW\",\n",
|
||||
" ]\n",
|
||||
"]\n",
|
||||
"nation_new_ind"
|
||||
]
|
||||
},
|
||||
|
@ -270,7 +281,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['HRS_ET'].unique()"
|
||||
"nation_new_ind[\"HRS_ET\"].unique()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -293,7 +304,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['HRS_ET'].value_counts()"
|
||||
"nation_new_ind[\"HRS_ET\"].value_counts()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -314,7 +325,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['AML_ET'].unique()"
|
||||
"nation_new_ind[\"AML_ET\"].unique()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -337,7 +348,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['AML_ET'].value_counts()"
|
||||
"nation_new_ind[\"AML_ET\"].value_counts()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -358,7 +369,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['AML_RAW'].unique()"
|
||||
"nation_new_ind[\"AML_RAW\"].unique()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -380,7 +391,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['AML_RAW'].value_counts()"
|
||||
"nation_new_ind[\"AML_RAW\"].value_counts()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -401,7 +412,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['FUDS_ET'].unique()"
|
||||
"nation_new_ind[\"FUDS_ET\"].unique()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -424,7 +435,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['FUDS_ET'].value_counts()"
|
||||
"nation_new_ind[\"FUDS_ET\"].value_counts()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -445,7 +456,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['FUDS_RAW'].unique()"
|
||||
"nation_new_ind[\"FUDS_RAW\"].unique()"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -468,7 +479,7 @@
|
|||
}
|
||||
],
|
||||
"source": [
|
||||
"nation_new_ind['FUDS_RAW'].value_counts()"
|
||||
"nation_new_ind[\"FUDS_RAW\"].value_counts()"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
|
|
@ -36,8 +36,8 @@
|
|||
" engine=\"pyogrio\",\n",
|
||||
")\n",
|
||||
"end = time.time()\n",
|
||||
" \n",
|
||||
"print(\"Time taken to execute the function using pyogrio is\", end-begin)"
|
||||
"\n",
|
||||
"print(\"Time taken to execute the function using pyogrio is\", end - begin)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -59,11 +59,13 @@
|
|||
"census_tract_gdf = gpd.read_file(\n",
|
||||
" CensusETL.NATIONAL_TRACT_JSON_PATH,\n",
|
||||
" engine=\"fiona\",\n",
|
||||
" include_fields=[\"GEOID10\"]\n",
|
||||
" include_fields=[\"GEOID10\"],\n",
|
||||
")\n",
|
||||
"end2 = time.time()\n",
|
||||
" \n",
|
||||
"print(\"Time taken to execute the function using include fields is\", end2-begin2)"
|
||||
"\n",
|
||||
"print(\n",
|
||||
" \"Time taken to execute the function using include fields is\", end2 - begin2\n",
|
||||
")"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
|
@ -1369,7 +1369,9 @@
|
|||
"\n",
|
||||
"results = results.reset_index()\n",
|
||||
"\n",
|
||||
"results.to_csv(\"~/Downloads/tribal_area_as_a_share_of_tract_area.csv\", index=False)\n",
|
||||
"results.to_csv(\n",
|
||||
" \"~/Downloads/tribal_area_as_a_share_of_tract_area.csv\", index=False\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"# Printing results\n",
|
||||
"print(results)"
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
from collections import namedtuple
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,7 +1,6 @@
|
|||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import pandas as pd
|
||||
from data_pipeline.score.score import Score
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
import data_pipeline.score.field_names as field_names
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,11 +1,11 @@
|
|||
from typing import Tuple
|
||||
|
||||
import data_pipeline.etl.score.constants as constants
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
import data_pipeline.etl.score.constants as constants
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
from typing import Tuple
|
||||
|
||||
import data_pipeline.etl.score.constants as constants
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
|
||||
from data_pipeline.score.score import Score
|
||||
import data_pipeline.score.field_names as field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
import data_pipeline.etl.score.constants as constants
|
||||
from data_pipeline.score.utils import calculate_tract_adjacency_scores
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -1,6 +1,5 @@
|
|||
import pandas as pd
|
||||
from data_pipeline.score.score_narwhal import ScoreNarwhal
|
||||
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
"""Utilities to help generate the score."""
|
||||
import pandas as pd
|
||||
import geopandas as gpd
|
||||
import data_pipeline.score.field_names as field_names
|
||||
import geopandas as gpd
|
||||
import pandas as pd
|
||||
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
# XXX: @jorge I am torn about the coupling that importing from
|
||||
# etl.sources vs keeping the code DRY. Thoughts?
|
||||
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
||||
|
|
|
@ -3,7 +3,6 @@ from pathlib import Path
|
|||
from shutil import copyfile
|
||||
|
||||
import pytest
|
||||
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
|
||||
|
|
|
@ -1,8 +1,8 @@
|
|||
import pandas as pd
|
||||
import pytest
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.score.field_names import GEOID_TRACT_FIELD
|
||||
from data_pipeline.etl.score import constants
|
||||
from data_pipeline.score.field_names import GEOID_TRACT_FIELD
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
|
|
|
@ -1,9 +1,11 @@
|
|||
# flake8: noqa: W0613,W0611,F811
|
||||
from dataclasses import dataclass
|
||||
|
||||
import pytest
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
from data_pipeline.score.score_narwhal import ScoreNarwhal
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
from .fixtures import final_score_df # pylint: disable=unused-import
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -2,36 +2,35 @@
|
|||
# pylint: disable=unused-import,too-many-arguments
|
||||
from dataclasses import dataclass
|
||||
from typing import List
|
||||
import pytest
|
||||
import pandas as pd
|
||||
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import pytest
|
||||
from data_pipeline.etl.score import constants
|
||||
from data_pipeline.etl.score.constants import TILES_ISLAND_AREA_FIPS_CODES
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.score.field_names import GEOID_TRACT_FIELD
|
||||
from data_pipeline.etl.score.constants import TILES_ISLAND_AREA_FIPS_CODES
|
||||
from .fixtures import (
|
||||
final_score_df,
|
||||
ejscreen_df,
|
||||
hud_housing_df,
|
||||
census_acs_df,
|
||||
cdc_places_df,
|
||||
census_acs_median_incomes_df,
|
||||
cdc_life_expectancy_df,
|
||||
doe_energy_burden_df,
|
||||
national_risk_index_df,
|
||||
dot_travel_disadvantage_df,
|
||||
fsf_fire_df,
|
||||
nature_deprived_df,
|
||||
eamlis_df,
|
||||
fuds_df,
|
||||
geocorr_urban_rural_df,
|
||||
census_decennial_df,
|
||||
census_2010_df,
|
||||
hrs_df,
|
||||
national_tract_df,
|
||||
tribal_overlap,
|
||||
)
|
||||
|
||||
from .fixtures import cdc_life_expectancy_df # noqa
|
||||
from .fixtures import cdc_places_df # noqa
|
||||
from .fixtures import census_2010_df # noqa
|
||||
from .fixtures import census_acs_df # noqa
|
||||
from .fixtures import census_acs_median_incomes_df # noqa
|
||||
from .fixtures import census_decennial_df # noqa
|
||||
from .fixtures import doe_energy_burden_df # noqa
|
||||
from .fixtures import dot_travel_disadvantage_df # noqa
|
||||
from .fixtures import eamlis_df # noqa
|
||||
from .fixtures import ejscreen_df # noqa
|
||||
from .fixtures import final_score_df # noqa
|
||||
from .fixtures import fsf_fire_df # noqa
|
||||
from .fixtures import fuds_df # noqa
|
||||
from .fixtures import geocorr_urban_rural_df # noqa
|
||||
from .fixtures import hrs_df # noqa
|
||||
from .fixtures import hud_housing_df # noqa
|
||||
from .fixtures import national_risk_index_df # noqa
|
||||
from .fixtures import national_tract_df # noqa
|
||||
from .fixtures import nature_deprived_df # noqa
|
||||
from .fixtures import tribal_overlap # noqa
|
||||
|
||||
pytestmark = pytest.mark.smoketest
|
||||
UNMATCHED_TRACT_THRESHOLD = 1000
|
||||
|
|
|
@ -1,10 +1,9 @@
|
|||
# pylint: disable=protected-access
|
||||
|
||||
import pandas as pd
|
||||
import pytest
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.etl.score.etl_score import ScoreETL
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
logger = get_module_logger(__name__)
|
||||
|
|
|
@ -1,18 +1,20 @@
|
|||
# flake8: noqa: W0613,W0611,F811
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
import pandas as pd
|
||||
|
||||
import geopandas as gpd
|
||||
import numpy as np
|
||||
import pandas as pd
|
||||
import pytest
|
||||
from data_pipeline.config import settings
|
||||
from data_pipeline.etl.score import constants
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.etl.score.constants import THRESHOLD_COUNT_TO_SHOW_FIELD_NAME
|
||||
from data_pipeline.etl.score.constants import TILES_SCORE_COLUMNS
|
||||
from data_pipeline.etl.score.constants import (
|
||||
TILES_SCORE_COLUMNS,
|
||||
THRESHOLD_COUNT_TO_SHOW_FIELD_NAME,
|
||||
USER_INTERFACE_EXPERIENCE_FIELD_NAME,
|
||||
)
|
||||
from data_pipeline.score import field_names
|
||||
|
||||
from .fixtures import final_score_df # pylint: disable=unused-import
|
||||
|
||||
pytestmark = pytest.mark.smoketest
|
||||
|
|
|
@ -1,17 +1,17 @@
|
|||
# pylint: disable=protected-access
|
||||
# flake8: noqa=F841
|
||||
from contextlib import contextmanager
|
||||
from functools import partial
|
||||
from pathlib import Path
|
||||
from unittest import mock
|
||||
from functools import partial
|
||||
from contextlib import contextmanager
|
||||
|
||||
import pytest
|
||||
import pandas as pd
|
||||
import pytest
|
||||
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
|
||||
from data_pipeline.score import field_names
|
||||
from data_pipeline.score.utils import (
|
||||
calculate_tract_adjacency_scores as original_calculate_tract_adjacency_score,
|
||||
)
|
||||
from data_pipeline.etl.sources.geo_utils import get_tract_geojson
|
||||
from data_pipeline.score import field_names
|
||||
|
||||
|
||||
@contextmanager
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
# pylint: disable=protected-access
|
||||
import pathlib
|
||||
from unittest import mock
|
||||
|
||||
import requests
|
||||
from data_pipeline.etl.base import ExtractTransformLoad
|
||||
from data_pipeline.etl.sources.cdc_life_expectancy.etl import CDCLifeExpectancy
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
import pathlib
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
|
||||
from data_pipeline.etl.sources.cdc_places.etl import CDCPlacesETL
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
|
||||
|
||||
class TestCDCPlacesETL(TestETL):
|
||||
|
|
|
@ -1,9 +1,7 @@
|
|||
# pylint: disable=protected-access
|
||||
import pathlib
|
||||
|
||||
from data_pipeline.etl.sources.doe_energy_burden.etl import (
|
||||
DOEEnergyBurden,
|
||||
)
|
||||
from data_pipeline.etl.sources.doe_energy_burden.etl import DOEEnergyBurden
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
|
|
@ -1,9 +1,10 @@
|
|||
import pathlib
|
||||
|
||||
import geopandas as gpd
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
from data_pipeline.etl.sources.dot_travel_composite.etl import (
|
||||
TravelCompositeETL,
|
||||
)
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
|
||||
|
||||
class TestTravelCompositeETL(TestETL):
|
||||
|
|
|
@ -1,11 +1,9 @@
|
|||
# pylint: disable=protected-access
|
||||
from unittest import mock
|
||||
import pathlib
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from unittest import mock
|
||||
|
||||
from data_pipeline.etl.sources.eamlis.etl import (
|
||||
AbandonedMineETL,
|
||||
)
|
||||
from data_pipeline.etl.base import ValidGeoLevel
|
||||
from data_pipeline.etl.sources.eamlis.etl import AbandonedMineETL
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
from data_pipeline.utils import get_module_logger
|
||||
|
||||
|
|
|
@ -1,6 +1,7 @@
|
|||
import pathlib
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
|
||||
from data_pipeline.etl.sources.ejscreen.etl import EJSCREENETL
|
||||
from data_pipeline.tests.sources.example.test_etl import TestETL
|
||||
|
||||
|
||||
class TestEJSCREENETL(TestETL):
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue