* Create ScoreCalculator
This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or similar
classes.
* Add factor logic for score L
Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.
Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* Adds dev dependencies to requirements.txt and re-runs black on codebase
* Adds test and code for national risk index etl, still in progress
* Removes test_data from .gitignore
* Adds test data to nation_risk_index tests
* Creates tests and ETL class for NRI data
* Adds tests for load() and transform() methods of NationalRiskIndexETL
* Updates README.md with info about the NRI dataset
* Adds to dos
* Moves tests and test data into a tests/ dir in national_risk_index
* Moves tmp_dir for tests into data/tmp/tests/
* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests
can only use fixtures specified in conftests within the same package
* Fixes issue with df.equals() in test_transform()
* Files reformatted by black
* Commit changes to other files after re-running black
* Fixes unused import that caused lint checks to fail
* Moves tests/ directory to app root for data_pipeline
* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly
* Adds test for ExtractTransformLoad.__init__() and base.py
* Fixes failing flake8 test
* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml
* Adds test for validate_output()
* Adds remaining tests
* Removes is_dataset from init method
* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a
non-census dataset and removes the need to overwrite the value in
mock_etl fixture
* Re-formats files with black and fixes broken tox tests
* Change downloadable file names
* Remove constants because we're dynamically creating these
* Update to "communities" for the descriptor word based on team convo
* Add timestamp in 2020-09-20-0930 format because I personally think
this is the best ^.^
* Add a CLI command to run ETL Score Post so that we don't have to
run the score generation just to get new downloadable files.
* Also make sure the old downloadable files are cleaned up on the
run of this command.
* Remove unused library, thanks pylint!
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Add pytest to tox run in CI/CD
* Try fixing tox dependencies for pytest
* update poetry to get ci/cd passing
* Run poetry export with --dev flag to include dev dependencies such as pytest
* WIP updating test fixtures to include PDF
* Remove dev dependencies from reqs and add pytest to envlist to make build faster
* passing score_post tests
* Add pytest tox (#729)
* Fix failing pytest
* Fixes failing tox tests and updates requirements.txt to include dev deps
* pickle protocol 4
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
Co-authored-by: Billy Daly <williamdaly422@gmail.com>
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
* Update downloadable zip file
* Don't use spaces in the name, as per #620
* Add the score D columns, as per #596
* fix paths and directories in etl_score_post
while the tests seemed to be passing, I encountered an error when
running poetry run score, which was caused by us creating a directory
called <name>.csv, instead of creating the parent directory.
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Fixes#341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit
* Location bug
* Adding Load tests
* Fixing XLSX filename
* Adding downloadable zip test
* updating pickle
* Fixing pylint warnings
* Updte readme to correct some typos and reorganize test content structure
* Removing unused schemas file, adding details to readme around pickles, per PR feedback
* Update test to pass with Score D added to score file; update path in readme
* fix requirements.txt after merge
* fix poetry.lock after merge
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* WIP refactor
* Exract score calculations into their own methods
* do all initial df prep in single method
* Fix error in docs for running etl for single dataset
* WIP understanding HUD and linguistic iso data
* Add comments from initial group review on PR
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Adds dev dependencies to requirements.txt and re-runs black on codebase
* Adds test and code for national risk index etl, still in progress
* Removes test_data from .gitignore
* Adds test data to nation_risk_index tests
* Creates tests and ETL class for NRI data
* Adds tests for load() and transform() methods of NationalRiskIndexETL
* Updates README.md with info about the NRI dataset
* Adds to dos
* Moves tests and test data into a tests/ dir in national_risk_index
* Moves tmp_dir for tests into data/tmp/tests/
* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests
can only use fixtures specified in conftests within the same package
* Fixes issue with df.equals() in test_transform()
* Files reformatted by black
* Commit changes to other files after re-running black
* Fixes unused import that caused lint checks to fail
* Moves tests/ directory to app root for data_pipeline
* Fixes#303 : adding downloadable zip archive logic
* linter recommendations
* Pushes data directory to AWS. We'll want to move to use AWS for this ASAP, but this works for now
* updating pattern
Error this addresses:
File "/Users/lucas/Documents/usds/repos/justice40-tool/data/data-pipeline/data_pipeline/etl/runner.py", line 71, in etl_runner
f"data_pipeline.etl.sources.{dataset['module_dir']}.etl"
TypeError: 'NoneType' object is not subscriptable
* Fixes#456 - Our data directory should adopt standard python package structure
* a few missed references
* updating readme
* updating requirements
* Running Black
* Fixes for flake8
* updating pylint