j40-cejst-2

279 commits 159 branches 0 tags 330 MiB

Author	SHA1	Message	Date
Billy Daly	d1273b63c5	Add ETL Contract Checks (#619 ) * Adds dev dependencies to requirements.txt and re-runs black on codebase * Adds test and code for national risk index etl, still in progress * Removes test_data from .gitignore * Adds test data to nation_risk_index tests * Creates tests and ETL class for NRI data * Adds tests for load() and transform() methods of NationalRiskIndexETL * Updates README.md with info about the NRI dataset * Adds to dos * Moves tests and test data into a tests/ dir in national_risk_index * Moves tmp_dir for tests into data/tmp/tests/ * Promotes fixtures to conftest and relocates national_risk_index tests: The relocation of national_risk_index tests is necessary because tests can only use fixtures specified in conftests within the same package * Fixes issue with df.equals() in test_transform() * Files reformatted by black * Commit changes to other files after re-running black * Fixes unused import that caused lint checks to fail * Moves tests/ directory to app root for data_pipeline * Adds new methods to ExtractTransformLoad base class: - __init__() Initializes class attributes - _get_census_fips_codes() Loads a dataframe with the fips codes for census block group and tract - validate_init() Checks that the class was initialized correctly - validate_output() Checks that the output was loaded correctly * Adds test for ExtractTransformLoad.__init__() and base.py * Fixes failing flake8 test * Changes geo_col to geoid_col and changes is_dataset to is_census in yaml * Adds test for validate_output() * Adds remaining tests * Removes is_dataset from init method * Makes CENSUS_CSV a class attribute instead of a class global: This ensures that CENSUS_CSV is only set when the ETL class is for a non-census dataset and removes the need to overwrite the value in mock_etl fixture * Re-formats files with black and fixes broken tox tests	2021-10-13 15:54:15 -04:00
Shelby Switzer	d8c73e6a02	Change downloadable file names (#708 ) * Change downloadable file names * Remove constants because we're dynamically creating these * Update to "communities" for the descriptor word based on team convo * Add timestamp in 2020-09-20-0930 format because I personally think this is the best ^.^ * Add a CLI command to run ETL Score Post so that we don't have to run the score generation just to get new downloadable files. * Also make sure the old downloadable files are cleaned up on the run of this command. * Remove unused library, thanks pylint! Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>	2021-10-01 15:04:37 -04:00
Jorge Escobar	5bd63c083b	Run all Census, ETL, Score, Combine and Tilefy in one command (#662 ) * Run all Census, ETL, Score, Combine and Tilefy in one command * docker cmd * some docker improvements * feedback updates * lint	2021-09-14 14:15:34 -04:00
Lucas Merrill Brown	65ceb7900f	Score F, testing methodology (#510 ) * fixing dependency issue * fixing more dependencies * including fraction of state AMI * wip * nitpick whitespace * etl working now * wip on scoring * fix rename error * reducing metrics * fixing score f * fixing readme * adding dependency * passing tests; * linting/black * removing unnecessary sample * fixing error * adding verify flag on etl/base Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>	2021-08-24 16:40:54 -04:00
Nat Hillard	9a9d5fdf7f	Backend change for Zipfile pt. 2 (#469 ) * Fixes #303 : adding downloadable zip archive logic * linter recommendations * Pushes data directory to AWS. We'll want to move to use AWS for this ASAP, but this works for now * updating pattern	2021-08-09 10:39:59 -04:00
Nat Hillard	9d962eb5d9	Moving from relative imports to absolute to enable poetry run python data-pipeline/application.py [command] (#476 )	2021-08-06 11:41:28 -04:00
Nat Hillard	c1568e87c0	Data directory should adopt standard Poetry-suggested python package structure (#457 ) * Fixes #456 - Our data directory should adopt standard python package structure * a few missed references * updating readme * updating requirements * Running Black * Fixes for flake8 * updating pylint	2021-08-05 15:35:54 -04:00

Renamed from data/data-pipeline/utils.py (Browse further)

7 commits