Add tests for all non-census sources (#1899)

* Refactor CDC life-expectancy (1554)

* Update to new tract list (#1554)

* Adjust for tests (#1848)

* Add tests for cdc_places (#1848)

* Add EJScreen tests (#1848)

* Add tests for HUD housing (#1848)

* Add tests for GeoCorr (#1848)

* Add persistent poverty tests (#1848)

* Update for sources without zips, for new validation (#1848)

* Update tests for new multi-CSV but (#1848)

Lucas updated the CDC life expectancy data to handle a bug where two
states are missing from the US Overall download. Since virtually none of
our other ETL classes download multiple CSVs directly like this, it
required a pretty invasive new mocking strategy.

* Add basic tests for nature deprived (#1848)

* Add wildfire tests (#1848)

* Add flood risk tests (#1848)

* Add DOT travel tests (#1848)

* Add historic redlining tests (#1848)

* Add tests for ME and WI (#1848)

* Update now that validation exists (#1848)

* Adjust for validation (#1848)

* Add health insurance back to cdc places (#1848)

Ooops

* Update tests with new field (#1848)

* Test for blank tract removal (#1848)

* Add tracts for clipping behavior

* Test clipping and zfill behavior (#1848)

* Fix bad test assumption (#1848)

* Simplify class, add test for tract padding (#1848)

* Fix percentage inversion, update tests (#1848)

Looking through the transformations, I noticed that we were subtracting
a percentage that is usually between 0-100 from 1 instead of 100, and so
were endind up with some surprising results. Confirmed with lucasmbrown-usds

* Add note about first street data (#1848)
This commit is contained in:
Matt Bowen 2022-09-19 15:17:00 -04:00 committed by GitHub
commit 876655d2b2
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
88 changed files with 2032 additions and 178 deletions

View file

@ -0,0 +1,17 @@
GEOID,count_properties,burnprob_year00_flag,burnprob_year30_flag
6027000800,942,31,634
6069000802,1131,0,264
6061021322,1483,13,478
15001021010,1888,62,550
15001021101,3463,18,192
15007040603,1557,0,509
15007040700,1535,0,43
15009030100,1660,177,968
15009030201,6144,173,2856
15001021402,4118,20,329
15001021800,2814,111,770
15009030402,3375,7,437
15009030800,4847,3268,3529
15003010201,2335,1949,2005
15007040604,5365,3984,4439
4003001402,1,1,1
1 GEOID count_properties burnprob_year00_flag burnprob_year30_flag
2 6027000800 942 31 634
3 6069000802 1131 0 264
4 6061021322 1483 13 478
5 15001021010 1888 62 550
6 15001021101 3463 18 192
7 15007040603 1557 0 509
8 15007040700 1535 0 43
9 15009030100 1660 177 968
10 15009030201 6144 173 2856
11 15001021402 4118 20 329
12 15001021800 2814 111 770
13 15009030402 3375 7 437
14 15009030800 4847 3268 3529
15 15003010201 2335 1949 2005
16 15007040604 5365 3984 4439
17 4003001402 1 1 1

View file

@ -0,0 +1,17 @@
GEOID10_TRACT,Count of properties eligible for wildfire risk calculation within tract (floor of 250),Count of properties at risk of wildfire today,Count of properties at risk of wildfire in 30 years,Share of properties at risk of fire today,Share of properties at risk of fire in 30 years
06027000800,942,31,634,0.0329087049,0.6730360934
06069000802,1131,0,264,0.0000000000,0.2334217507
06061021322,1483,13,478,0.0087660148,0.3223196224
15001021010,1888,62,550,0.0328389831,0.2913135593
15001021101,3463,18,192,0.0051978054,0.0554432573
15007040603,1557,0,509,0.0000000000,0.3269107258
15007040700,1535,0,43,0.0000000000,0.0280130293
15009030100,1660,177,968,0.1066265060,0.5831325301
15009030201,6144,173,2856,0.0281575521,0.4648437500
15001021402,4118,20,329,0.0048567266,0.0798931520
15001021800,2814,111,770,0.0394456290,0.2736318408
15009030402,3375,7,437,0.0020740741,0.1294814815
15009030800,4847,3268,3529,0.6742314834,0.7280792243
15003010201,2335,1949,2005,0.8346895075,0.8586723769
15007040604,5365,3984,4439,0.7425908667,0.8273998136
04003001402,250,1,1,0.0040000000,0.0040000000
1 GEOID10_TRACT Count of properties eligible for wildfire risk calculation within tract (floor of 250) Count of properties at risk of wildfire today Count of properties at risk of wildfire in 30 years Share of properties at risk of fire today Share of properties at risk of fire in 30 years
2 06027000800 942 31 634 0.0329087049 0.6730360934
3 06069000802 1131 0 264 0.0000000000 0.2334217507
4 06061021322 1483 13 478 0.0087660148 0.3223196224
5 15001021010 1888 62 550 0.0328389831 0.2913135593
6 15001021101 3463 18 192 0.0051978054 0.0554432573
7 15007040603 1557 0 509 0.0000000000 0.3269107258
8 15007040700 1535 0 43 0.0000000000 0.0280130293
9 15009030100 1660 177 968 0.1066265060 0.5831325301
10 15009030201 6144 173 2856 0.0281575521 0.4648437500
11 15001021402 4118 20 329 0.0048567266 0.0798931520
12 15001021800 2814 111 770 0.0394456290 0.2736318408
13 15009030402 3375 7 437 0.0020740741 0.1294814815
14 15009030800 4847 3268 3529 0.6742314834 0.7280792243
15 15003010201 2335 1949 2005 0.8346895075 0.8586723769
16 15007040604 5365 3984 4439 0.7425908667 0.8273998136
17 04003001402 250 1 1 0.0040000000 0.0040000000

View file

@ -0,0 +1,17 @@
GEOID,count_properties,Count of properties at risk of wildfire today,Count of properties at risk of wildfire in 30 years,GEOID10_TRACT,Count of properties eligible for wildfire risk calculation within tract (floor of 250),Share of properties at risk of fire today,Share of properties at risk of fire in 30 years
06027000800,942,31,634,06027000800,942,0.0329087049,0.6730360934
06069000802,1131,0,264,06069000802,1131,0.0000000000,0.2334217507
06061021322,1483,13,478,06061021322,1483,0.0087660148,0.3223196224
15001021010,1888,62,550,15001021010,1888,0.0328389831,0.2913135593
15001021101,3463,18,192,15001021101,3463,0.0051978054,0.0554432573
15007040603,1557,0,509,15007040603,1557,0.0000000000,0.3269107258
15007040700,1535,0,43,15007040700,1535,0.0000000000,0.0280130293
15009030100,1660,177,968,15009030100,1660,0.1066265060,0.5831325301
15009030201,6144,173,2856,15009030201,6144,0.0281575521,0.4648437500
15001021402,4118,20,329,15001021402,4118,0.0048567266,0.0798931520
15001021800,2814,111,770,15001021800,2814,0.0394456290,0.2736318408
15009030402,3375,7,437,15009030402,3375,0.0020740741,0.1294814815
15009030800,4847,3268,3529,15009030800,4847,0.6742314834,0.7280792243
15003010201,2335,1949,2005,15003010201,2335,0.8346895075,0.8586723769
15007040604,5365,3984,4439,15007040604,5365,0.7425908667,0.8273998136
4003001402,1,1,1,04003001402,250,0.0040000000,0.0040000000
1 GEOID count_properties Count of properties at risk of wildfire today Count of properties at risk of wildfire in 30 years GEOID10_TRACT Count of properties eligible for wildfire risk calculation within tract (floor of 250) Share of properties at risk of fire today Share of properties at risk of fire in 30 years
2 06027000800 942 31 634 06027000800 942 0.0329087049 0.6730360934
3 06069000802 1131 0 264 06069000802 1131 0.0000000000 0.2334217507
4 06061021322 1483 13 478 06061021322 1483 0.0087660148 0.3223196224
5 15001021010 1888 62 550 15001021010 1888 0.0328389831 0.2913135593
6 15001021101 3463 18 192 15001021101 3463 0.0051978054 0.0554432573
7 15007040603 1557 0 509 15007040603 1557 0.0000000000 0.3269107258
8 15007040700 1535 0 43 15007040700 1535 0.0000000000 0.0280130293
9 15009030100 1660 177 968 15009030100 1660 0.1066265060 0.5831325301
10 15009030201 6144 173 2856 15009030201 6144 0.0281575521 0.4648437500
11 15001021402 4118 20 329 15001021402 4118 0.0048567266 0.0798931520
12 15001021800 2814 111 770 15001021800 2814 0.0394456290 0.2736318408
13 15009030402 3375 7 437 15009030402 3375 0.0020740741 0.1294814815
14 15009030800 4847 3268 3529 15009030800 4847 0.6742314834 0.7280792243
15 15003010201 2335 1949 2005 15003010201 2335 0.8346895075 0.8586723769
16 15007040604 5365 3984 4439 15007040604 5365 0.7425908667 0.8273998136
17 4003001402 1 1 1 04003001402 250 0.0040000000 0.0040000000

View file

@ -0,0 +1,22 @@
import pathlib
from data_pipeline.tests.sources.example.test_etl import TestETL
from data_pipeline.etl.sources.fsf_wildfire_risk.etl import WildfireRiskETL
class TestWildfireRiskETL(TestETL):
_ETL_CLASS = WildfireRiskETL
_SAMPLE_DATA_PATH = pathlib.Path(__file__).parents[0] / "data"
_SAMPLE_DATA_FILE_NAME = "fsf_fire/fire-tract2010.csv"
_SAMPLE_DATA_ZIP_FILE_NAME = "fsf_fire.zip"
_EXTRACT_TMP_FOLDER_NAME = "WildfireRiskETL"
_FIXTURES_SHARED_TRACT_IDS = TestETL._FIXTURES_SHARED_TRACT_IDS + [
"04003001402" # A tract with 1 property, also missing a digit
]
def setup_method(self, _method, filename=__file__):
"""Invoke `setup_method` from Parent, but using the current file name.
This code can be copied identically between all child classes.
"""
super().setup_method(_method=_method, filename=filename)