Generate Geo-aware scores for all zoom levels (#391)

* generate Geo-aware scores for all zoom levels * usa high progress * testing dissolve * checkpoint * changing type * removing breakpoint * validation notebooks * quick update * score validation * fixes for county merge * code completed
2025-08-08 11:54:19 -07:00 · 2021-07-28 16:07:28 -04:00 · 2021-07-28 16:07:28 -04:00 · b404fdcc43
commit b404fdcc43
parent 446c8d1f68
14 changed files with 3023 additions and 270 deletions
--- a/data/data-pipeline/README.md
+++ b/data/data-pipeline/README.md
@ -44,7 +44,7 @@ TODO add mermaid diagram

 #### Step 0: Set up your environment

-1. After cloning the project locally, change to this directory: `cd score`
+1. After cloning the project locally, change to this directory: `cd data/data-pipeline`
 1. Choose whether you'd like to run this application using Docker or if you'd like to install the dependencies locally so you can contribute to the project.
    - **With Docker:** Follow these [installation instructions](https://docs.docker.com/get-docker/) and skip down to the [Running with Docker section](#running-with-docker) for more information
    - **For Local Development:** Skip down to the [Local Development section](#local-development) for more detailed installation instructions
@ -53,7 +53,7 @@ TODO add mermaid diagram
 #### Step 1: Run the ETL script for each data source

 1. Call the `etl-run` command using the application manager `application.py` **NOTE:** This may take several minutes to execute.
-   - With Docker: `docker run --rm -it j40_score /bin/sh -c "python3 application.py etl-run"`
+   - With Docker: `docker run --rm -it j40_data_pipeline /bin/sh -c "python3 application.py etl-run"`
   - With Poetry: `poetry run python application.py etl-run`
 1. The `etl-run` command will execute the corresponding ETL script for each data source in `etl/sources/`. For example, `etl/sources/ejscreen/etl.py` is the ETL script for EJSCREEN data.
 1. Each ETL script will extract the data from its original source, then format the data into `.csv` files that get stored in the relevant folder in `data/dataset/`. For example, HUD Housing data is stored in `data/dataset/hud_housing/usa.csv`
@ -64,7 +64,7 @@ _For example: `poetry run python application.py etl-run ejscreen` would only run
 #### Step 2: Calculate the Justice40 score experiments

 1. Call the `score-run` command using the application manager `application.py` **NOTE:** This may take several minutes to execute.
-   - With Docker: `docker run --rm -it j40_score /bin/sh -c "python3 application.py score-run"`
+   - With Docker: `docker run --rm -it j40_data_pipeline /bin/sh -c "python3 application.py score-run"`
   - With Poetry: `poetry run python application.py score-run`
 1. The `score-run` command will execute the `etl/score/etl.py` script which loads the data from each of the source files added to the `data/dataset/` directory by the ETL scripts in Step 1.
 1. These data sets are merged into a single dataframe using their Census Block Group GEOID as a common key, and the data in each of the columns is standardized in two ways: