Initial refactor for Score ETL (#618)

* WIP refactor

* Exract score calculations into their own methods

* do all initial df prep in single method

* Fix error in docs for running etl for single dataset

* WIP understanding HUD and linguistic iso data

* Add comments from initial group review on PR

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
This commit is contained in:
Shelby Switzer 2021-09-10 10:34:34 -04:00 committed by GitHub
commit ac62933d16
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 200 additions and 141 deletions

View file

@ -94,7 +94,7 @@ TODO add mermaid diagram
3. Each ETL script will extract the data from its original source, then format the data into `.csv` files that get stored in the relevant folder in `data_pipeline/data/dataset/`. For example, HUD Housing data is stored in `data_pipeline/data/dataset/hud_housing/usa.csv`
_**NOTE:** You have the option to pass the name of a specific data source to the `etl-run` command using the `-d` flag, which will limit the execution of the ETL process to that specific data source._
_For example: `poetry run etl -- -d ejscreen` would only run the ETL process for EJSCREEN data._
_For example: `poetry run etl -d ejscreen` would only run the ETL process for EJSCREEN data._
#### Step 3: Calculate the Justice40 score experiments