Add ETL Contract Checks (#619)

* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline

* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for 
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly

* Adds test for ExtractTransformLoad.__init__() and base.py

* Fixes failing flake8 test

* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml

* Adds test for validate_output()

* Adds remaining tests

* Removes is_dataset from init method

* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a 
non-census dataset and removes the need to overwrite the value in 
mock_etl fixture

* Re-formats files with black and fixes broken tox tests
This commit is contained in:
Billy Daly 2021-10-13 15:54:15 -04:00 committed by GitHub
commit d1273b63c5
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
13 changed files with 358 additions and 32 deletions

View file

@ -0,0 +1,11 @@
GEOID10,POPULATION
050070403001,1000
050070403002,1500
050010201001,1000
050010201002,1500
150070405001,2000
150070405002,2250
150010210101,2000
150010210102,1500
150010211011,1750
150010211012,1500
1 GEOID10 POPULATION
2 050070403001 1000
3 050070403002 1500
4 050010201001 1000
5 050010201002 1500
6 150070405001 2000
7 150070405002 2250
8 150010210101 2000
9 150010210102 1500
10 150010211011 1750
11 150010211012 1500

View file

@ -0,0 +1,11 @@
GEOID10,GEOID10_TRACT,COL 1,COL 2,COL 3
050070403001,05007040300,10,10,10
050070403002,05007040300,20,20,20
050010201001,05001020100,30,30,30
050010201002,05001020100,40,40,40
150070405001,15007040500,50,50,50
150070405002,15007040500,60,60,60
150010210101,15001021010,70,70,70
150010210102,15001021010,80,80,80
150010211011,15001021101,90,90,90
150010211012,15001021101,100,100,100
1 GEOID10 GEOID10_TRACT COL 1 COL 2 COL 3
2 050070403001 05007040300 10 10 10
3 050070403002 05007040300 20 20 20
4 050010201001 05001020100 30 30 30
5 050010201002 05001020100 40 40 40
6 150070405001 15007040500 50 50 50
7 150070405002 15007040500 60 60 60
8 150010210101 15001021010 70 70 70
9 150010210102 15001021010 80 80 80
10 150010211011 15001021101 90 90 90
11 150010211012 15001021101 100 100 100