* just testing that the boolean is preserved on gha
* checking drop tracts works
* adding a check to the agvalue calculation for nri
* updated with error messages
* update Python version on README; tuple typing fix
* Alaska tribal points fix (#1821)
* Bump mistune from 0.8.4 to 2.0.3 in /data/data-pipeline (#1777)
Bumps [mistune](https://github.com/lepture/mistune) from 0.8.4 to 2.0.3.
- [Release notes](https://github.com/lepture/mistune/releases)
- [Changelog](https://github.com/lepture/mistune/blob/master/docs/changes.rst)
- [Commits](https://github.com/lepture/mistune/compare/v0.8.4...v2.0.3)
---
updated-dependencies:
- dependency-name: mistune
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* poetry update
* initial pass of score tests
* add threshold tests
* added ses threshold (not donut, not island)
* testing suite -- stopping for the day
* added test for lead proxy indicator
* Refactor score tests to make them less verbose and more direct (#1865)
* Cleanup tests slightly before refactor (#1846)
* Refactor score calculations tests
* Feedback from review
* Refactor output tests like calculatoin tests (#1846) (#1870)
* Reorganize files (#1846)
* Switch from lru_cache to fixture scorpes (#1846)
* Add tests for all factors (#1846)
* Mark smoketests and run as part of be deply (#1846)
* Update renamed var (#1846)
* Switch from named tuple to dataclass (#1846)
This is annoying, but pylint in python3.8 was crashing parsing the named
tuple. We weren't using any namedtuple-specific features, so I made the
type a dataclass just to get pylint to behave.
* Add default timout to requests (#1846)
* Fix type (#1846)
* Fix merge mistake on poetry.lock (#1846)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Matt Bowen <83967628+mattbowen-usds@users.noreply.github.com>
Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>
* Add notebook to generate test data (#1780)
* Add Abandoned Mine Land data (#1780)
Using a similar structure but simpler apporach compared to FUDs, add an
indicator for whether a tract has an abandonded mine.
* Adding some detail to dataset readmes
Just a thought!
* Apply feedback from revieiw (#1780)
* Fixup bad string that broke test (#1780)
* Update a string that I should have renamed (#1780)
* Reduce number of threads to reduce memory pressure (#1780)
* Try not running geo data (#1780)
* Run the high-memory sets separately (#1780)
* Actually deduplicate (#1780)
* Add flag for memory intensive ETLs (#1780)
* Document new flag for datasets (#1780)
* Add flag for new datasets fro rebase (#1780)
Co-authored-by: Emma Nechamkin <97977170+emma-nechamkin@users.noreply.github.com>
* Add spatial join method (#1871)
Since we'll need to figure out the tracts for a large number of points
in future tickets, add a utility to handle grabbing the tract geometries
and adding tract data to a point dataset.
* Add FUDS, also jupyter lab (#1871)
* Add YAML configs for FUDS (#1871)
* Allow input geoid to be optional (#1871)
* Add FUDS ETL, tests, test-datae noteobook (#1871)
This adds the ETL class for Formerly Used Defense Sites (FUDS). This is
different from most other ETLs since these FUDS are not provided by
tract, but instead by geographic point, so we need to assign FUDS to
tracts and then do calculations from there.
* Floats -> Ints, as I intended (#1871)
* Floats -> Ints, as I intended (#1871)
* Formatting fixes (#1871)
* Add test false positive GEOIDs (#1871)
* Add gdal binaries (#1871)
* Refactor pandas code to be more idiomatic (#1871)
Per Emma, the more pandas-y way of doing my counts is using np.where to
add the values i need, then groupby and size. It is definitely more
compact, and also I think more correct!
* Update configs per Emma suggestions (#1871)
* Type fixed! (#1871)
* Remove spurious import from vscode (#1871)
* Snapshot update after changing col name (#1871)
* Move up GDAL (#1871)
* Adjust geojson strategy (#1871)
* Try running census separately first (#1871)
* Fix import order (#1871)
* Cleanup cache strategy (#1871)
* Download census data from S3 instead of re-calculating (#1871)
* Clarify pandas code per Emma (#1871)
* added tribalId for Supplemental dataset (#1804)
* Setting zoom levels for tribal map (#1810)
* NRI dataset and initial score YAML configuration (#1534)
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* update be staging gha
* checkpoint
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* checkpoint
* PR Review
* renoving source url
* tests
* stop execution of ETL if there's a YAML schema issue
* update be staging gha
* adding source url as class var again
* clean up
* force cache bust
* gha cache bust
* dynamically set score vars from YAML
* docsctrings
* removing last updated year - optional reverse percentile
* passing tests
* sort order
* column ordening
* PR review
* class level vars
* Updating DatasetsConfig
* fix pylint errors
* moving metadata hint back to code
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* Correct copy typo (#1809)
* Add basic test suite for COI (#1518)
* Update COI to use new yaml (#1518)
* Add tests for DOE energy budren (1518
* Add dataset config for energy budren (1518)
* Refactor ETL to use datasets.yml (#1518)
* Add fake GEOIDs to COI tests (#1518)
* Refactor _setup_etl_instance_and_run_extract to base (#1518)
For the three classes we've done so far, a generic
_setup_etl_instance_and_run_extract will work fine, for the moment we
can reuse the same setup method until we decide future classes need more
flexibility --- but they can also always subclass so...
* Add output-path tests (#1518)
* Update YAML to match constant (#1518)
* Don't blindly set float format (#1518)
* Add defaults for extract (#1518)
* Run YAML load on all subclasses (#1518)
* Update description fields (#1518)
* Update YAML per final format (#1518)
* Update fixture tract IDs (#1518)
* Update base class refactor (#1518)
Now that NRI is final I needed to make a small number of updates to my
refactored code.
* Remove old comment (#1518)
* Fix type signature and return (#1518)
* Update per code review (#1518)
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
* Remove code that drops Guam and USVI from ETL
* Add back code for dropping rows by FIPS code
We may want this functionality, so let's keep it and just make the constant currently be an empty array.
Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
* Update PR threshold count to 10
We now show 10 indicators for PR. See the discussion on the github issue for more info: https://github.com/usds/justice40-tool/issues/1621
* Do not use linguistic iso for Puerto Rico
Closes 1350.
Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* update be staging gha
* checkpoint
* update be staging gha
* NRI dataset and initial score YAML configuration
* checkpoint
* adding data checks for release branch
* passing tests
* adding INPUT_EXTRACTED_FILE_NAME to base class
* lint
* columns to keep and tests
* checkpoint
* PR Review
* renoving source url
* tests
* stop execution of ETL if there's a YAML schema issue
* update be staging gha
* adding source url as class var again
* clean up
* force cache bust
* gha cache bust
* dynamically set score vars from YAML
* docsctrings
* removing last updated year - optional reverse percentile
* passing tests
* sort order
* column ordening
* PR review
* class level vars
* Updating DatasetsConfig
* fix pylint errors
* moving metadata hint back to code
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
* starting tribal pr
* further pipeline work
* bia merge working
* alaska villages and tribal geo generate
* tribal folders
* adding data full run
* tile generation
* tribal tile deploy
* Change low to high transition and global zoom
- change the low to high transition from 7 to 5. This can not go any lower as high tiles on AWS only go to zoom level 5
- reduce the zoom level globally on all census tracts
* Remove geolocation from feature flag
- geolocation is now available to all
* Add python notebook that sorts all tracts by area
- add a column of the required zoom level for the tract to be fully contained in the viewport
* Place geolocation back to behind a feature flag
* Differentiate zoom levels b/w shortcuts and tracts