* Rezip CSV and Excel files with Codebook
* codebook version
* packages fix
* pydantic
* lint
* Remove markdown link from markdown checker (#1936)
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
* Refactor CDC life-expectancy (1554)
* Update to new tract list (#1554)
* Adjust for tests (#1848)
* Add tests for cdc_places (#1848)
* Add EJScreen tests (#1848)
* Add tests for HUD housing (#1848)
* Add tests for GeoCorr (#1848)
* Add persistent poverty tests (#1848)
* Update for sources without zips, for new validation (#1848)
* Update tests for new multi-CSV but (#1848)
Lucas updated the CDC life expectancy data to handle a bug where two
states are missing from the US Overall download. Since virtually none of
our other ETL classes download multiple CSVs directly like this, it
required a pretty invasive new mocking strategy.
* Add basic tests for nature deprived (#1848)
* Add wildfire tests (#1848)
* Add flood risk tests (#1848)
* Add DOT travel tests (#1848)
* Add historic redlining tests (#1848)
* Add tests for ME and WI (#1848)
* Update now that validation exists (#1848)
* Adjust for validation (#1848)
* Add health insurance back to cdc places (#1848)
Ooops
* Update tests with new field (#1848)
* Test for blank tract removal (#1848)
* Add tracts for clipping behavior
* Test clipping and zfill behavior (#1848)
* Fix bad test assumption (#1848)
* Simplify class, add test for tract padding (#1848)
* Fix percentage inversion, update tests (#1848)
Looking through the transformations, I noticed that we were subtracting
a percentage that is usually between 0-100 from 1 instead of 100, and so
were endind up with some surprising results. Confirmed with lucasmbrown-usds
* Add note about first street data (#1848)
* added tribalId for Supplemental dataset (#1804)
* Setting zoom levels for tribal map (#1810)
* NRI dataset and initial score YAML configuration (#1534)
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* update be staging gha
* checkpoint
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* checkpoint
* PR Review
* renoving source url
* tests
* stop execution of ETL if there's a YAML schema issue
* update be staging gha
* adding source url as class var again
* clean up
* force cache bust
* gha cache bust
* dynamically set score vars from YAML
* docsctrings
* removing last updated year - optional reverse percentile
* passing tests
* sort order
* column ordening
* PR review
* class level vars
* Updating DatasetsConfig
* fix pylint errors
* moving metadata hint back to code
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* Correct copy typo (#1809)
* Add basic test suite for COI (#1518)
* Update COI to use new yaml (#1518)
* Add tests for DOE energy budren (1518
* Add dataset config for energy budren (1518)
* Refactor ETL to use datasets.yml (#1518)
* Add fake GEOIDs to COI tests (#1518)
* Refactor _setup_etl_instance_and_run_extract to base (#1518)
For the three classes we've done so far, a generic
_setup_etl_instance_and_run_extract will work fine, for the moment we
can reuse the same setup method until we decide future classes need more
flexibility --- but they can also always subclass so...
* Add output-path tests (#1518)
* Update YAML to match constant (#1518)
* Don't blindly set float format (#1518)
* Add defaults for extract (#1518)
* Run YAML load on all subclasses (#1518)
* Update description fields (#1518)
* Update YAML per final format (#1518)
* Update fixture tract IDs (#1518)
* Update base class refactor (#1518)
Now that NRI is final I needed to make a small number of updates to my
refactored code.
* Remove old comment (#1518)
* Fix type signature and return (#1518)
* Update per code review (#1518)
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* update be staging gha
* checkpoint
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* checkpoint
* PR Review
* renoving source url
* tests
* stop execution of ETL if there's a YAML schema issue
* update be staging gha
* adding source url as class var again
* clean up
* force cache bust
* gha cache bust
* dynamically set score vars from YAML
* docsctrings
* removing last updated year - optional reverse percentile
* passing tests
* sort order
* column ordening
* PR review
* class level vars
* Updating DatasetsConfig
* fix pylint errors
* moving metadata hint back to code
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* WIP on parallelizing
* switching to get_tmp_path for nri
* switching to get_tmp_path everywhere necessary
* fixing linter errors
* moving heavy ETLs to front of line
* add hold
* moving cdc places up
* removing unnecessary print
* moving h&t up
* adding parallel to geo post
* better census labels
* switching to concurrent futures
* fixing output