* WIP
* Create ScoreCalculator
This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or similar
classes.
* Add factor logic for score L
Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.
Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator
* Update thresholds and get score L calc working
* Update header name for consistency and update comparison tool
* Initial move of score to score calculator
* WIP big refactor
* Continued WIP on score refactor
* WIP score refactor
* Get to a working score-run
* Refactor to pass df to score init
This makes it easier to pass df around within a class with multiple
methods that require df.
* Updates from Black
* Updates from linting
* Use named imports instead of wildcard; log more
* Additional refactors
* move more field names to field_names constants file
* import constants without a relative path (would break docker)
* run linting
* raise error if add_columns is not implemented in a child class
* Refactor dict to namedtuple in score c
* Update L to use all percentile field
* change high school ed field in L back
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Create ScoreCalculator
This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or similar
classes.
* Add factor logic for score L
Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.
Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* Adds dev dependencies to requirements.txt and re-runs black on codebase
* Adds test and code for national risk index etl, still in progress
* Removes test_data from .gitignore
* Adds test data to nation_risk_index tests
* Creates tests and ETL class for NRI data
* Adds tests for load() and transform() methods of NationalRiskIndexETL
* Updates README.md with info about the NRI dataset
* Adds to dos
* Moves tests and test data into a tests/ dir in national_risk_index
* Moves tmp_dir for tests into data/tmp/tests/
* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests
can only use fixtures specified in conftests within the same package
* Fixes issue with df.equals() in test_transform()
* Files reformatted by black
* Commit changes to other files after re-running black
* Fixes unused import that caused lint checks to fail
* Moves tests/ directory to app root for data_pipeline
* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly
* Adds test for ExtractTransformLoad.__init__() and base.py
* Fixes failing flake8 test
* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml
* Adds test for validate_output()
* Adds remaining tests
* Removes is_dataset from init method
* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a
non-census dataset and removes the need to overwrite the value in
mock_etl fixture
* Re-formats files with black and fixes broken tox tests
* Change downloadable file names
* Remove constants because we're dynamically creating these
* Update to "communities" for the descriptor word based on team convo
* Add timestamp in 2020-09-20-0930 format because I personally think
this is the best ^.^
* Add a CLI command to run ETL Score Post so that we don't have to
run the score generation just to get new downloadable files.
* Also make sure the old downloadable files are cleaned up on the
run of this command.
* Remove unused library, thanks pylint!
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Add pytest to tox run in CI/CD
* Try fixing tox dependencies for pytest
* update poetry to get ci/cd passing
* Run poetry export with --dev flag to include dev dependencies such as pytest
* WIP updating test fixtures to include PDF
* Remove dev dependencies from reqs and add pytest to envlist to make build faster
* passing score_post tests
* Add pytest tox (#729)
* Fix failing pytest
* Fixes failing tox tests and updates requirements.txt to include dev deps
* pickle protocol 4
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
Co-authored-by: Billy Daly <williamdaly422@gmail.com>
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
* Update downloadable zip file
* Don't use spaces in the name, as per #620
* Add the score D columns, as per #596
* fix paths and directories in etl_score_post
while the tests seemed to be passing, I encountered an error when
running poetry run score, which was caused by us creating a directory
called <name>.csv, instead of creating the parent directory.
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Fixes#341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit
* Location bug
* Adding Load tests
* Fixing XLSX filename
* Adding downloadable zip test
* updating pickle
* Fixing pylint warnings
* Updte readme to correct some typos and reorganize test content structure
* Removing unused schemas file, adding details to readme around pickles, per PR feedback
* Update test to pass with Score D added to score file; update path in readme
* fix requirements.txt after merge
* fix poetry.lock after merge
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* WIP refactor
* Exract score calculations into their own methods
* do all initial df prep in single method
* Fix error in docs for running etl for single dataset
* WIP understanding HUD and linguistic iso data
* Add comments from initial group review on PR
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Adds dev dependencies to requirements.txt and re-runs black on codebase
* Adds test and code for national risk index etl, still in progress
* Removes test_data from .gitignore
* Adds test data to nation_risk_index tests
* Creates tests and ETL class for NRI data
* Adds tests for load() and transform() methods of NationalRiskIndexETL
* Updates README.md with info about the NRI dataset
* Adds to dos
* Moves tests and test data into a tests/ dir in national_risk_index
* Moves tmp_dir for tests into data/tmp/tests/
* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests
can only use fixtures specified in conftests within the same package
* Fixes issue with df.equals() in test_transform()
* Files reformatted by black
* Commit changes to other files after re-running black
* Fixes unused import that caused lint checks to fail
* Moves tests/ directory to app root for data_pipeline
* Revert "dockerize front end (#558)"
This reverts commit 89c23faf7a.
* dockerize frontend
- adds score server and website docker compose
- creates docker ignore
- adds .env.* for dev, prod and local
- adds dockerfile for website
- adds env to gatsby-config
- adds hostaddress to develop / start script
- adds istructions in README for running docker
- replaces fixed URLS with ones based on env vars
- creates a score server dockerfile
* updates README to change map tiles source
* adds env DATA_SOURCE:development to deploy GHA
* capitalize readme
* initial docker
* adds concurrency to be able to run yarn install
* adds 0.0.0.0 to allow docker access
* adds web service
* adds env variables
* updates root path
* adds volumes
* adds docker to readme
* adds score server client docker
* docker updates after convo
* speeds up build and removes env vars
* adds client as volume
* updates to docker setup
* checkpoint
* updates the docker file
* adds .env.* files
* replaces serve with http-server for cors
Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
* Initial draft for data provenance
We want to make the data usable/available at every step of our data
pipeline. This starts te addition to the README that spells out the data
provenance and where each version of the data as it goes through our
pipeline lives.
* Update README with placeholders for next steps in data provenance
* Add coming soon placeholders for remaining data locations
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>