Add tests for all non-census sources (#1899)

* Refactor CDC life-expectancy (1554)

* Update to new tract list (#1554)

* Adjust for tests (#1848)

* Add tests for cdc_places (#1848)

* Add EJScreen tests (#1848)

* Add tests for HUD housing (#1848)

* Add tests for GeoCorr (#1848)

* Add persistent poverty tests (#1848)

* Update for sources without zips, for new validation (#1848)

* Update tests for new multi-CSV but (#1848)

Lucas updated the CDC life expectancy data to handle a bug where two
states are missing from the US Overall download. Since virtually none of
our other ETL classes download multiple CSVs directly like this, it
required a pretty invasive new mocking strategy.

* Add basic tests for nature deprived (#1848)

* Add wildfire tests (#1848)

* Add flood risk tests (#1848)

* Add DOT travel tests (#1848)

* Add historic redlining tests (#1848)

* Add tests for ME and WI (#1848)

* Update now that validation exists (#1848)

* Adjust for validation (#1848)

* Add health insurance back to cdc places (#1848)

Ooops

* Update tests with new field (#1848)

* Test for blank tract removal (#1848)

* Add tracts for clipping behavior

* Test clipping and zfill behavior (#1848)

* Fix bad test assumption (#1848)

* Simplify class, add test for tract padding (#1848)

* Fix percentage inversion, update tests (#1848)

Looking through the transformations, I noticed that we were subtracting
a percentage that is usually between 0-100 from 1 instead of 100, and so
were endind up with some surprising results. Confirmed with lucasmbrown-usds

* Add note about first street data (#1848)
This commit is contained in:
Matt Bowen 2022-09-19 15:17:00 -04:00 committed by GitHub
commit 876655d2b2
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
88 changed files with 2032 additions and 178 deletions

View file

@ -0,0 +1,16 @@
GEOID10_TRACT,DOT Travel Barriers Score
06061021322,52.3684736425
06069000802,67.6807523475
15001021101,65.6905624925
15001021800,64.6348560575
15007040603,47.3085751425
15007040604,48.7634318775
15007040700,56.8031262775
15009030201,64.1950173025
15009030402,50.2530948600
15009030800,56.1490333775
15001021010,69.4901838075
15001021402,53.4854747375
15003010201,54.7191133125
15009030100,37.8950511525
06027000800,38.5533081475
1 GEOID10_TRACT DOT Travel Barriers Score
2 06061021322 52.3684736425
3 06069000802 67.6807523475
4 15001021101 65.6905624925
5 15001021800 64.6348560575
6 15007040603 47.3085751425
7 15007040604 48.7634318775
8 15007040700 56.8031262775
9 15009030201 64.1950173025
10 15009030402 50.2530948600
11 15009030800 56.1490333775
12 15001021010 69.4901838075
13 15001021402 53.4854747375
14 15003010201 54.7191133125
15 15009030100 37.8950511525
16 06027000800 38.5533081475

File diff suppressed because one or more lines are too long

View file

@ -0,0 +1,45 @@
import pathlib
import geopandas as gpd
from data_pipeline.tests.sources.example.test_etl import TestETL
from data_pipeline.etl.sources.dot_travel_composite.etl import (
TravelCompositeETL,
)
class TestTravelCompositeETL(TestETL):
_ETL_CLASS = TravelCompositeETL
_SAMPLE_DATA_PATH = pathlib.Path(__file__).parents[0] / "data"
_SAMPLE_DATA_FILE_NAME = "DOT_Disadvantage_Layer_Final_April2022.shp"
_SAMPLE_DATA_ZIP_FILE_NAME = "Shapefile_and_Metadata.zip"
_EXTRACT_TMP_FOLDER_NAME = "TravelCompositeETL"
def setup_method(self, _method, filename=__file__):
"""Invoke `setup_method` from Parent, but using the current file name.
This code can be copied identically between all child classes.
"""
super().setup_method(_method=_method, filename=filename)
def test_extract_produces_valid_data(self, mock_etl, mock_paths):
etl = self._setup_etl_instance_and_run_extract(
mock_etl=mock_etl,
mock_paths=mock_paths,
)
df = gpd.read_file(
etl.get_tmp_path() / self._SAMPLE_DATA_FILE_NAME,
dtype={etl.GEOID_TRACT_FIELD_NAME: str},
)
assert df.shape[0] == 30
assert df.shape[1] == 86
def test_transform_removes_blank_tracts(self, mock_etl, mock_paths):
etl: TravelCompositeETL = self._setup_etl_instance_and_run_extract(
mock_etl=mock_etl,
mock_paths=mock_paths,
)
etl.transform()
etl.load()
df = etl.get_data_frame()
assert df.shape[0] == 15
assert df.shape[1] == len(etl.COLUMNS_TO_KEEP)