* Fixes#341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit
* Location bug
* Adding Load tests
* Fixing XLSX filename
* Adding downloadable zip test
* updating pickle
* Fixing pylint warnings
* Updte readme to correct some typos and reorganize test content structure
* Removing unused schemas file, adding details to readme around pickles, per PR feedback
* Update test to pass with Score D added to score file; update path in readme
* fix requirements.txt after merge
* fix poetry.lock after merge
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* WIP refactor
* Exract score calculations into their own methods
* do all initial df prep in single method
* Fix error in docs for running etl for single dataset
* WIP understanding HUD and linguistic iso data
* Add comments from initial group review on PR
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Initial draft for data provenance
We want to make the data usable/available at every step of our data
pipeline. This starts te addition to the README that spells out the data
provenance and where each version of the data as it goes through our
pipeline lives.
* Update README with placeholders for next steps in data provenance
* Add coming soon placeholders for remaining data locations
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Fixes#456 - Our data directory should adopt standard python package structure
* a few missed references
* updating readme
* updating requirements
* Running Black
* Fixes for flake8
* updating pylint
* Minor documentation updates, plus calenvironscreen S3 URL fix
* Update score comparison docs and code
* Add steps for running the comparison tool
* Update HUD recap ETL to ensure GEOID is imported as a string (if it is
imported as an interger by default it will strip the beginning "0" from
many IDs)
* Add note about execution time
* Move step from paragraph to list
* Update output dir in README for comp tool
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* initial checkin
* gitignore and docker-compose update
* readme update and error on hud
* encoding issue
* one more small README change
* data roadmap re-strcuture
* pyproject sort
* small update to score output folders
* checkpoint
* couple of last fixes