Initial refactor for Score ETL (#618)

* WIP refactor * Exract score calculations into their own methods * do all initial df prep in single method * Fix error in docs for running etl for single dataset * WIP understanding HUD and linguistic iso data * Add comments from initial group review on PR Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2025-09-29 17:13:17 -07:00 · 2021-09-10 10:34:34 -04:00 · 2021-09-10 10:34:34 -04:00 · ac62933d16
commit ac62933d16
parent 470c474367
4 changed files with 200 additions and 141 deletions
--- a/data/data-pipeline/README.md
+++ b/data/data-pipeline/README.md
@ -94,7 +94,7 @@ TODO add mermaid diagram
 3. Each ETL script will extract the data from its original source, then format the data into `.csv` files that get stored in the relevant folder in `data_pipeline/data/dataset/`. For example, HUD Housing data is stored in `data_pipeline/data/dataset/hud_housing/usa.csv`

 _**NOTE:** You have the option to pass the name of a specific data source to the `etl-run` command using the `-d` flag, which will limit the execution of the ETL process to that specific data source._
-_For example: `poetry run etl -- -d ejscreen` would only run the ETL process for EJSCREEN data._
+_For example: `poetry run etl -d ejscreen` would only run the ETL process for EJSCREEN data._

 #### Step 3: Calculate the Justice40 score experiments