Display score L on map (#849)

* updates to first docker run

* tile constants

* frontend changes

* updating pickles instructions

* pickles
This commit is contained in:
Jorge Escobar 2021-11-05 16:26:14 -04:00 committed by GitHub
commit 053dde0d40
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
12 changed files with 91 additions and 16 deletions

View file

@ -305,12 +305,86 @@ In a bit more detail:
#### Updating Pickles #### Updating Pickles
If you update the input our output to various methods, it is necessary to create new pickles so that data is validated correctly. To do this: If you update the score in any way, it is necessary to create new pickles so that data is validated correctly.
1. Drop a breakpoint just before the dataframe will otherwise be written to / read from disk. If you're using VSCode, use one of the named run targets within `data-pipeline` such as `Score Full Run` , and put a breakpoint in the margin just before the actionable step. More on using breakpoints in VSCode [here](https://code.visualstudio.com/docs/editor/debugging#_breakpoints). If you are not using VSCode, you can put the line `breakpoint()` in your code and it will stop where you have placed the line in whatever calling context you are using. It starts with the `data_pipeline/etl/score/tests/sample_data/score_data_initial.csv`, which is the first two rows of the `score/full/usa.csv`.
1. In your editor/terminal, run `df.to_pickle("data_pipeline/etl/score/tests/snapshots/YOUR_OUT_PATH_HERE.pkl", protocol=4)` to write the pickle to the appropriate location on disk.
1. Be sure to do this for all inputs/outputs that have changed as a result of your modification. It is often necessary to do this several times for cascading operations. To update this file, run a full score generation and then update the file as follows:
1. To inspect your pickle, open a python interpreter, then run `pickle.load( open( "data_pipeline/etl/score/tests/snapshots/YOUR_OUT_PATH_HERE.pkl", "rb" ) )` to get file contents. ```
import pickle
from pathlib import Path
import pandas as pd
data_path = Path.cwd()
# score data expected
score_csv_path = data_path / "data_pipeline" / "data" / "score" / "csv" / "full" / "usa.csv"
score_initial_df = pd.read_csv(score_csv_path, dtype={"GEOID10": "string"}, low_memory=False)[:2]
score_initial_df.to_csv(data_path / "data_pipeline" / "etl" / "score" / "tests" / "sample_data" /"score_data_initial.csv", index=False)
```
We have four pickle files that correspond to expected files:
- `score_data_expected.pkl`: Initial score without counties
- `score_transformed_expected.pkl`: Intermediate score with `etl._extract_score` and `etl. _transform_score` applied. There's no file for this intermediate process, so we need to capture the pickle mid-process.
- `tile_data_expected.pkl`: Score with columns to be baked in tiles
- `downloadable_data_expected.pk1`: Downloadable csv
To update the pickles, let's go one by one:
For the `score_transformed_expected.pkl`, but a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L58), before the `pdt.assert_frame_equal` and run:
`pytest data_pipeline/etl/score/tests/test_score_post.py::test_transform_score`
Once on the breakpoint, capture the df to a pickle as follows:
```
import pickle
from pathlib import Path
data_path = Path.cwd()
score_transformed_actual.to_pickle(data_path / "data_pipeline" / "etl" / "score" / "tests" / "snapshots" / "score_transformed_expected.pkl", protocol=4)
```
Then take out the breakpoint and re-run the test: `pytest data_pipeline/etl/score/tests/test_score_post.py::test_transform_score`
For the `score_data_expected.pkl`, but a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L78), before the `pdt.assert_frame_equal` and run:
`pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_score_data`
Once on the breakpoint, capture the df to a pickle as follows:
```
import pickle
from pathlib import Path
data_path = Path.cwd()
score_data_actual.to_pickle(data_path / "data_pipeline" / "etl" / "score" / "tests" / "snapshots" / "score_data_expected.pkl", protocol=4)
```
Then take out the breakpoint and re-run the test: `pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_score_data`
For the `tile_data_expected.pkl`, but a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L86), before the `pdt.assert_frame_equal` and run:
`pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_tile_data`
Once on the breakpoint, capture the df to a pickle as follows:
```
import pickle
from pathlib import Path
data_path = Path.cwd()
output_tiles_df_actual.to_pickle(data_path / "data_pipeline" / "etl" / "score" / "tests" / "snapshots" / "tile_data_expected.pkl", protocol=4)
```
Then take out the breakpoint and re-run the test: `pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_tile_data`
For the `downloadable_data_expected.pk1`, but a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L98), before the `pdt.assert_frame_equal` and run:
`pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_downloadable_data`
Once on the breakpoint, capture the df to a pickle as follows:
```
import pickle
from pathlib import Path
data_path = Path.cwd()
output_downloadable_df_actual.to_pickle(data_path / "data_pipeline" / "etl" / "score" / "tests" / "snapshots" / "downloadable_data_expected.pkl", protocol=4)
```
Then take out the breakpoint and re-run the test: `pytest data_pipeline/etl/score/tests/test_score_post.py::test_create_downloadable_data`
#### Future Enchancements #### Future Enchancements

View file

@ -262,6 +262,7 @@ def data_full_run(check: bool, data_source: str):
score_generate() score_generate()
logger.info("*** Running Post Score scripts") logger.info("*** Running Post Score scripts")
downloadable_cleanup()
score_post(data_source) score_post(data_source)
logger.info("*** Combining Score with Census Geojson") logger.info("*** Combining Score with Census Geojson")

View file

@ -83,9 +83,6 @@ def score_generate() -> None:
score_gen.transform() score_gen.transform()
score_gen.load() score_gen.load()
# Post Score Processing
score_post()
def score_post(data_source: str = "local") -> None: def score_post(data_source: str = "local") -> None:
"""Posts the score files to the local directory """Posts the score files to the local directory

View file

@ -69,6 +69,8 @@ TILES_SCORE_COLUMNS = [
"Score E (top 25th percentile)", "Score E (top 25th percentile)",
"Score G (communities)", "Score G (communities)",
"Score G", "Score G",
"Definition L (communities)",
"Definition L (percentile)",
"Poverty (Less than 200% of federal poverty line) (percentile)", "Poverty (Less than 200% of federal poverty line) (percentile)",
"Percent individuals age 25 or over with less than high school degree (percentile)", "Percent individuals age 25 or over with less than high school degree (percentile)",
"Linguistic isolation (percent) (percentile)", "Linguistic isolation (percent) (percentile)",
@ -95,6 +97,7 @@ TILES_SCORE_FLOAT_COLUMNS = [
"Score D (top 25th percentile)", "Score D (top 25th percentile)",
"Score E (percentile)", "Score E (percentile)",
"Score E (top 25th percentile)", "Score E (top 25th percentile)",
"Definition L (percentile)",
"Poverty (Less than 200% of federal poverty line)", "Poverty (Less than 200% of federal poverty line)",
"Percent individuals age 25 or over with less than high school degree", "Percent individuals age 25 or over with less than high school degree",
"Linguistic isolation (percent)", "Linguistic isolation (percent)",

View file

@ -31,8 +31,8 @@ class GeoScoreETL(ExtractTransformLoad):
self.DATA_PATH / "census" / "geojson" / "us.json" self.DATA_PATH / "census" / "geojson" / "us.json"
) )
self.TARGET_SCORE_NAME = "Score G" self.TARGET_SCORE_NAME = "Definition L (percentile)"
self.TARGET_SCORE_RENAME_TO = "G_SCORE" self.TARGET_SCORE_RENAME_TO = "L_SCORE"
self.NUMBER_OF_BUCKETS = 10 self.NUMBER_OF_BUCKETS = 10

File diff suppressed because one or more lines are too long

View file

@ -107,7 +107,7 @@ def check_census_data_source(
# check if census data is found locally # check if census data is found locally
if not os.path.isfile(census_data_path / "geojson" / "us.json"): if not os.path.isfile(census_data_path / "geojson" / "us.json"):
logger.info( logger.info(
"No local census data found. Please use '-d aws` to fetch from AWS" "No local census data found. Please use '-s aws` to fetch from AWS"
) )
sys.exit() sys.exit()