* Adds dev dependencies to requirements.txt and re-runs black on codebase
* Adds test and code for national risk index etl, still in progress
* Removes test_data from .gitignore
* Adds test data to nation_risk_index tests
* Creates tests and ETL class for NRI data
* Adds tests for load() and transform() methods of NationalRiskIndexETL
* Updates README.md with info about the NRI dataset
* Adds to dos
* Moves tests and test data into a tests/ dir in national_risk_index
* Moves tmp_dir for tests into data/tmp/tests/
* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests
can only use fixtures specified in conftests within the same package
* Fixes issue with df.equals() in test_transform()
* Files reformatted by black
* Commit changes to other files after re-running black
* Fixes unused import that caused lint checks to fail
* Moves tests/ directory to app root for data_pipeline
* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly
* Adds test for ExtractTransformLoad.__init__() and base.py
* Fixes failing flake8 test
* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml
* Adds test for validate_output()
* Adds remaining tests
* Removes is_dataset from init method
* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a
non-census dataset and removes the need to overwrite the value in
mock_etl fixture
* Re-formats files with black and fixes broken tox tests
* Change downloadable file names
* Remove constants because we're dynamically creating these
* Update to "communities" for the descriptor word based on team convo
* Add timestamp in 2020-09-20-0930 format because I personally think
this is the best ^.^
* Add a CLI command to run ETL Score Post so that we don't have to
run the score generation just to get new downloadable files.
* Also make sure the old downloadable files are cleaned up on the
run of this command.
* Remove unused library, thanks pylint!
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
* Fixes#303 : adding downloadable zip archive logic
* linter recommendations
* Pushes data directory to AWS. We'll want to move to use AWS for this ASAP, but this works for now
* updating pattern
* Fixes#456 - Our data directory should adopt standard python package structure
* a few missed references
* updating readme
* updating requirements
* Running Black
* Fixes for flake8
* updating pylint
2021-08-05 15:35:54 -04:00
Renamed from data/data-pipeline/utils.py (Browse further)