* Add spatial join method (#1871)
Since we'll need to figure out the tracts for a large number of points
in future tickets, add a utility to handle grabbing the tract geometries
and adding tract data to a point dataset.
* Add FUDS, also jupyter lab (#1871)
* Add YAML configs for FUDS (#1871)
* Allow input geoid to be optional (#1871)
* Add FUDS ETL, tests, test-datae noteobook (#1871)
This adds the ETL class for Formerly Used Defense Sites (FUDS). This is
different from most other ETLs since these FUDS are not provided by
tract, but instead by geographic point, so we need to assign FUDS to
tracts and then do calculations from there.
* Floats -> Ints, as I intended (#1871)
* Floats -> Ints, as I intended (#1871)
* Formatting fixes (#1871)
* Add test false positive GEOIDs (#1871)
* Add gdal binaries (#1871)
* Refactor pandas code to be more idiomatic (#1871)
Per Emma, the more pandas-y way of doing my counts is using np.where to
add the values i need, then groupby and size. It is definitely more
compact, and also I think more correct!
* Update configs per Emma suggestions (#1871)
* Type fixed! (#1871)
* Remove spurious import from vscode (#1871)
* Snapshot update after changing col name (#1871)
* Move up GDAL (#1871)
* Adjust geojson strategy (#1871)
* Try running census separately first (#1871)
* Fix import order (#1871)
* Cleanup cache strategy (#1871)
* Download census data from S3 instead of re-calculating (#1871)
* Clarify pandas code per Emma (#1871)
* added tribalId for Supplemental dataset (#1804)
* Setting zoom levels for tribal map (#1810)
* NRI dataset and initial score YAML configuration (#1534)
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* update be staging gha
* checkpoint
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* checkpoint
* PR Review
* renoving source url
* tests
* stop execution of ETL if there's a YAML schema issue
* update be staging gha
* adding source url as class var again
* clean up
* force cache bust
* gha cache bust
* dynamically set score vars from YAML
* docsctrings
* removing last updated year - optional reverse percentile
* passing tests
* sort order
* column ordening
* PR review
* class level vars
* Updating DatasetsConfig
* fix pylint errors
* moving metadata hint back to code
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* Correct copy typo (#1809)
* Add basic test suite for COI (#1518)
* Update COI to use new yaml (#1518)
* Add tests for DOE energy budren (1518
* Add dataset config for energy budren (1518)
* Refactor ETL to use datasets.yml (#1518)
* Add fake GEOIDs to COI tests (#1518)
* Refactor _setup_etl_instance_and_run_extract to base (#1518)
For the three classes we've done so far, a generic
_setup_etl_instance_and_run_extract will work fine, for the moment we
can reuse the same setup method until we decide future classes need more
flexibility --- but they can also always subclass so...
* Add output-path tests (#1518)
* Update YAML to match constant (#1518)
* Don't blindly set float format (#1518)
* Add defaults for extract (#1518)
* Run YAML load on all subclasses (#1518)
* Update description fields (#1518)
* Update YAML per final format (#1518)
* Update fixture tract IDs (#1518)
* Update base class refactor (#1518)
Now that NRI is final I needed to make a small number of updates to my
refactored code.
* Remove old comment (#1518)
* Fix type signature and return (#1518)
* Update per code review (#1518)
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
* Remove code that drops Guam and USVI from ETL
* Add back code for dropping rows by FIPS code
We may want this functionality, so let's keep it and just make the constant currently be an empty array.
Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
* Update PR threshold count to 10
We now show 10 indicators for PR. See the discussion on the github issue for more info: https://github.com/usds/justice40-tool/issues/1621
* Do not use linguistic iso for Puerto Rico
Closes 1350.
Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
* added tribalId for Supplemental dataset (#1804)
* Setting zoom levels for tribal map (#1810)
* NRI dataset and initial score YAML configuration (#1534)
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* update be staging gha
* checkpoint
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* checkpoint
* PR Review
* renoving source url
* tests
* stop execution of ETL if there's a YAML schema issue
* update be staging gha
* adding source url as class var again
* clean up
* force cache bust
* gha cache bust
* dynamically set score vars from YAML
* docsctrings
* removing last updated year - optional reverse percentile
* passing tests
* sort order
* column ordening
* PR review
* class level vars
* Updating DatasetsConfig
* fix pylint errors
* moving metadata hint back to code
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* Correct copy typo (#1809)
* Add basic test suite for COI (#1518)
* Update COI to use new yaml (#1518)
* Add tests for DOE energy budren (1518
* Add dataset config for energy budren (1518)
* Refactor ETL to use datasets.yml (#1518)
* Add fake GEOIDs to COI tests (#1518)
* Refactor _setup_etl_instance_and_run_extract to base (#1518)
For the three classes we've done so far, a generic
_setup_etl_instance_and_run_extract will work fine, for the moment we
can reuse the same setup method until we decide future classes need more
flexibility --- but they can also always subclass so...
* Add output-path tests (#1518)
* Update YAML to match constant (#1518)
* Don't blindly set float format (#1518)
* Add defaults for extract (#1518)
* Run YAML load on all subclasses (#1518)
* Update description fields (#1518)
* Update YAML per final format (#1518)
* Update fixture tract IDs (#1518)
* Update base class refactor (#1518)
Now that NRI is final I needed to make a small number of updates to my
refactored code.
* Remove old comment (#1518)
* Fix type signature and return (#1518)
* Update per code review (#1518)
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
* Remove code that drops Guam and USVI from ETL
* Add back code for dropping rows by FIPS code
We may want this functionality, so let's keep it and just make the constant currently be an empty array.
Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
* Update PR threshold count to 10
We now show 10 indicators for PR. See the discussion on the github issue for more info: https://github.com/usds/justice40-tool/issues/1621
* Do not use linguistic iso for Puerto Rico
Closes 1350.
Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* update be staging gha
* checkpoint
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* checkpoint
* PR Review
* renoving source url
* tests
* stop execution of ETL if there's a YAML schema issue
* update be staging gha
* adding source url as class var again
* clean up
* force cache bust
* gha cache bust
* dynamically set score vars from YAML
* docsctrings
* removing last updated year - optional reverse percentile
* passing tests
* sort order
* column ordening
* PR review
* class level vars
* Updating DatasetsConfig
* fix pylint errors
* moving metadata hint back to code
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* starting tribal pr
* further pipeline work
* bia merge working
* alaska villages and tribal geo generate
* tribal folders
* adding data full run
* tile generation
* tribal tile deploy