* Add spatial join method (#1871)
Since we'll need to figure out the tracts for a large number of points
in future tickets, add a utility to handle grabbing the tract geometries
and adding tract data to a point dataset.
* Add FUDS, also jupyter lab (#1871)
* Add YAML configs for FUDS (#1871)
* Allow input geoid to be optional (#1871)
* Add FUDS ETL, tests, test-datae noteobook (#1871)
This adds the ETL class for Formerly Used Defense Sites (FUDS). This is
different from most other ETLs since these FUDS are not provided by
tract, but instead by geographic point, so we need to assign FUDS to
tracts and then do calculations from there.
* Floats -> Ints, as I intended (#1871)
* Floats -> Ints, as I intended (#1871)
* Formatting fixes (#1871)
* Add test false positive GEOIDs (#1871)
* Add gdal binaries (#1871)
* Refactor pandas code to be more idiomatic (#1871)
Per Emma, the more pandas-y way of doing my counts is using np.where to
add the values i need, then groupby and size. It is definitely more
compact, and also I think more correct!
* Update configs per Emma suggestions (#1871)
* Type fixed! (#1871)
* Remove spurious import from vscode (#1871)
* Snapshot update after changing col name (#1871)
* Move up GDAL (#1871)
* Adjust geojson strategy (#1871)
* Try running census separately first (#1871)
* Fix import order (#1871)
* Cleanup cache strategy (#1871)
* Download census data from S3 instead of re-calculating (#1871)
* Clarify pandas code per Emma (#1871)
* starting tribal pr
* further pipeline work
* bia merge working
* alaska villages and tribal geo generate
* tribal folders
* adding data full run
* tile generation
* tribal tile deploy
* installation step
* trigger action
* installing to home dir
* dry-run
* pyenv
* py 2.8
* trying s4cmd
* removing pyenv
* poetry s4cmd
* num-threads
* public read
* poetry cache
* s4cmd all around
* poetry cache
* poetry cache
* install poetry packages
* poetry echo
* let's do this
* s4cmd install on run
* s4cmd
* ad aws back
* add aws back
* testing census api key and poetry caching
* census api key
* census api
* census api key #3
* 250
* poetry update
* poetry change
* check census api key
* force flag
* update score gen and tilefy; remove cached fips
* small gdal update
* invalidation
* missing cache ids
* Update Side Panel Tile Data
* Update Side Panel Tile Data
* Correct indicator names to match csv
* Replace Score with Rate
* Comment out FEMA Loss Rate to troubleshoot
* Removes all "FEMA Loss Rate" array elements
* Revert FEMA to Score
* Remove expected loss rate
* Remove RMP and NPL from BASIC array
* Attempt to make shape mismatch align
- update README typo
* Add Score L indicators to TILE_SCORE_FLOAT_COLUMNS
* removing cbg references
* completes the ticket
* Update side panel fields
* Update index file writing to create parent dir
* Updates from linting
* fixing missing field_names for island territories 90th percentile fields
* Update downloadable fields and fix field name
* Update file fields and tests
* Update ordering of fields and leave TODO
* Update pickle after re-ordering of file
* fixing bugs in etl_score_geo
* Repeating index for diesel fix
* passing tests
* adding pytest.ini
Co-authored-by: Vim USDS <vimal.k.shah@omb.eop.gov>
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Error this addresses:
File "/Users/lucas/Documents/usds/repos/justice40-tool/data/data-pipeline/data_pipeline/etl/runner.py", line 71, in etl_runner
f"data_pipeline.etl.sources.{dataset['module_dir']}.etl"
TypeError: 'NoneType' object is not subscriptable
* Fixes#456 - Our data directory should adopt standard python package structure
* a few missed references
* updating readme
* updating requirements
* Running Black
* Fixes for flake8
* updating pylint