j40-cejst-2/data/data-pipeline/data_pipeline
Emma Nechamkin 9c0e1993f6
Pipeline tile tests (#1864)
* temp update

* updating with fips check

* adding check on pfs

* updating with pfs test

* Update test_tiles_smoketests.py

* Fix lint errors (#1848)

* Add column names test (#1848)

* Mark tests as smoketests (#1848)

* Move to other score-related tests (#1848)

* Recast Total threshold criteria exceeded to int (#1848)

In writing tests to verify the output of the tiles csv matches the final
score CSV, I noticed TC/Total threshold criteria exceeded was getting
cast from an int64 to a float64 in the process of PostScoreETL. I
tracked it down to the line where we merge the score dataframe with
constants.DATA_CENSUS_CSV_FILE_PATH --- there where > 100 tracts in the
national census CSV that don't exist in the score, so those ended up
with a Total threshhold count of np.nan, which is a float, and thereby
cast those columns to float. For the moment I just cast it back.

* No need for low memeory (#1848)

* Add additional tests of tiles.csv (#1848)

* Drop pre-2010 rows before computing score (#1848)

Note this is probably NOT the optimal place for this change; it might
make more sense for each source to filter its own tracts down to the
acceptable tract list. However, that would be a pretty invasive change,
where this is central and plenty of other things are happening in score
transform that could be moved to sources, so for today, here's where the
change will live.

* Fix typo (#1848)

* Switch from filter to inner join (#1848)

* Remove no-op lines from tiles (#1848)

* Apply feedback from review, linter (#1848)

* Check the values oeverything in the frame (#1848)

* Refactor checker class (#1848)

* Add test for state names (#1848)

* cleanup from reviewing my own code (#1848)

* Fix lint error (#1858)

* Apply Emma's feedback from review (#1848)

* Remove refs to national_df (#1848)

* Account for new, fake nullable bools in tiles (#1848)

To handle a geojson limitation, Emma converted some nullable boolean
colunms to float64 in the tiles export with the values {0.0, 1.0, nan},
giving us the same expressiveness. Sadly, this broke my assumption that
all columns between the score and tiles csvs would have the same dtypes,
so I need to account for these new, fake bools in my test.

* Use equals instead of my worse version (#1848)

* Missed a spot where we called _create_score_data (#1848)

* Update per safety (#1848)

Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>
2022-09-01 13:07:14 -04:00
..
comparison_tool Imputing income using geographic neighbors (#1559) 2022-08-11 12:33:45 -04:00
content updated to show T/F/null vs T/F for AML and FUDS (#1866) 2022-08-24 20:22:59 -04:00
data Starting Tribal Boundaries Work (#1736) 2022-07-30 01:13:10 -04:00
etl Pipeline tile tests (#1864) 2022-09-01 13:07:14 -04:00
files Add files via upload (#1656) 2022-05-31 13:19:01 -04:00
ipython just testing that the boolean is preserved on gha (#1867) 2022-08-31 12:55:03 -04:00
score tribal tiles fix (#1874) 2022-09-01 10:19:13 -04:00
tests Pipeline tile tests (#1864) 2022-09-01 13:07:14 -04:00
tile Score tests (#1847) 2022-08-26 15:23:20 -04:00
__init__.py Data directory should adopt standard Poetry-suggested python package structure (#457) 2021-08-05 15:35:54 -04:00
application.py Add FUDS ETL (#1817) 2022-08-16 13:28:39 -04:00
config.py Score tests (#1847) 2022-08-26 15:23:20 -04:00
utils.py Score tests (#1847) 2022-08-26 15:23:20 -04:00