Commit graph

164 commits

Author SHA1 Message Date
Emma Nechamkin
5c41c95764 Revert "Fast flag update (#1844)"
This reverts commit d892bce6cf.
2022-08-19 14:05:45 -04:00
Emma Nechamkin
d892bce6cf
Fast flag update (#1844)
Added additional flags for the front end based on our conversation in stand up this morning.
2022-08-19 13:14:44 -04:00
Emma Nechamkin
3ba1c620f5
Update to use new FSF files (#1838)
backend is partially done!
2022-08-18 15:54:44 -04:00
Emma Nechamkin
cb4866b93f
Adding eamlis and fuds data to legacy pollution in score (#1832)
Update to add EAMLIS and FUDS data to score
2022-08-18 13:32:29 -04:00
Matt Bowen
6e41e0d9f0
Add donut hole calculation to score (#1828)
Adds adjacency index to the pipeline. Requires thorough QA
2022-08-18 12:04:46 -04:00
Emma Nechamkin
88dc2e5a8e updating to avoid conflicts 2022-08-17 14:28:02 -04:00
Emma Nechamkin
7d89d41e49
Adding NLCD data (#1826)
Adding NLCD's natural space indicator end to end to the score.
2022-08-17 14:21:28 -04:00
Emma Nechamkin
2e05b1d60c Merge branch 'emma-nechamkin/release/score-narwhal' of github.com:usds/justice40-tool into emma-nechamkin/release/score-narwhal 2022-08-17 11:34:37 -04:00
Matt Bowen
49623e4da0
Add abandoned mine lands data (#1824)
* Add notebook to generate test data (#1780)

* Add Abandoned Mine Land data (#1780)

Using a similar structure but simpler apporach compared to FUDs, add an
indicator for whether a tract has an abandonded mine.

* Adding some detail to dataset readmes

Just a thought!

* Apply feedback from revieiw (#1780)

* Fixup bad string that broke test (#1780)

* Update a string that I should have renamed (#1780)

* Reduce number of threads to reduce memory pressure (#1780)

* Try not running geo data (#1780)

* Run the high-memory sets separately (#1780)

* Actually deduplicate (#1780)

* Add flag for memory intensive ETLs (#1780)

* Document new flag for datasets (#1780)

* Add flag for new datasets fro rebase (#1780)

Co-authored-by: Emma Nechamkin <97977170+emma-nechamkin@users.noreply.github.com>
2022-08-17 11:33:59 -04:00
Emma Nechamkin
981a36cfa3 first run -- adding NCLD data to the ETL, but not yet to the score 2022-08-17 11:11:11 -04:00
Emma Nechamkin
5e378aea81
Adding first street foundation data (#1823)
Adding FSF flood and wildfire risk datasets to the score.
2022-08-17 10:14:23 -04:00
Emma Nechamkin
ebac552d75
Adding DOT composite to travel score (#1820)
This adds the DOT dataset to the ETL and to the score. Note that currently we take a percentile of an average of percentiles.
2022-08-16 14:44:39 -04:00
Vim USDS
932179841f Merge branch 'emma-nechamkin/release/score-narwhal' of https://github.com/usds/justice40-tool into emma-nechamkin/release/score-narwhal 2022-08-16 10:36:04 -07:00
Vim USDS
d6c04b1308 Disable markdown check for link 2022-08-16 10:35:57 -07:00
Matt Bowen
d5fbb802e8
Add FUDS ETL (#1817)
* Add spatial join method (#1871)

Since we'll need to figure out the tracts for a large number of points
in future tickets, add a utility to handle grabbing the tract geometries
and adding tract data to a point dataset.

* Add FUDS, also jupyter lab (#1871)

* Add YAML configs for FUDS (#1871)

* Allow input geoid to be optional (#1871)

* Add FUDS ETL, tests, test-datae noteobook (#1871)

This adds the ETL class for Formerly Used Defense Sites (FUDS). This is
different from most other ETLs since these FUDS are not provided by
tract, but instead by geographic point, so we need to assign FUDS to
tracts and then do calculations from there.

* Floats -> Ints, as I intended (#1871)

* Floats -> Ints, as I intended (#1871)

* Formatting fixes (#1871)

* Add test false positive GEOIDs (#1871)

* Add gdal binaries (#1871)

* Refactor pandas code to be more idiomatic (#1871)

Per Emma, the more pandas-y way of doing my counts is using np.where to
add the values i need, then groupby and size. It is definitely more
compact, and also I think more correct!

* Update configs per Emma suggestions (#1871)

* Type fixed! (#1871)

* Remove spurious import from vscode (#1871)

* Snapshot update after changing col name (#1871)

* Move up GDAL (#1871)

* Adjust geojson strategy (#1871)

* Try running census separately first (#1871)

* Fix import order (#1871)

* Cleanup cache strategy (#1871)

* Download census data from S3 instead of re-calculating (#1871)

* Clarify pandas code per Emma (#1871)
2022-08-16 13:28:39 -04:00
Emma Nechamkin
481a2a05f7
updated to fix linting errors (#1818)
Cleans and updates base branch
2022-08-11 16:34:56 -04:00
Emma Nechamkin
94cdc47cce Update etl_score_geo.py
Yikes! Fixing merge messup!
2022-08-11 12:38:32 -04:00
Matt Bowen
97e17546cc Refactor DOE Energy Burden and COI to use YAML (#1796)
* added tribalId for Supplemental dataset (#1804)

* Setting zoom levels for tribal map (#1810)

* NRI dataset and initial score YAML configuration (#1534)

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* update be staging gha

* checkpoint

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* checkpoint

* PR Review

* renoving source url

* tests

* stop execution of ETL if there's a YAML schema issue

* update be staging gha

* adding source url as class var again

* clean up

* force cache bust

* gha cache bust

* dynamically set score vars from YAML

* docsctrings

* removing last updated year - optional reverse percentile

* passing tests

* sort order

* column ordening

* PR review

* class level vars

* Updating DatasetsConfig

* fix pylint errors

* moving metadata hint back to code

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>

* Correct copy typo (#1809)

* Add basic test suite for COI (#1518)

* Update COI to use new yaml (#1518)

* Add tests for DOE energy budren (1518

* Add dataset config for energy budren (1518)

* Refactor ETL to use datasets.yml (#1518)

* Add fake GEOIDs to COI tests (#1518)

* Refactor _setup_etl_instance_and_run_extract to base (#1518)

For the three classes we've done so far, a generic
_setup_etl_instance_and_run_extract will work fine, for the moment we
can reuse the same setup method until we decide future classes need more
flexibility --- but they can also always subclass so...

* Add output-path tests (#1518)

* Update YAML to match constant (#1518)

* Don't blindly set float format (#1518)

* Add defaults for extract (#1518)

* Run YAML load on all subclasses (#1518)

* Update description fields (#1518)

* Update YAML per final format (#1518)

* Update fixture tract IDs (#1518)

* Update base class refactor (#1518)

Now that NRI is final I needed to make a small number of updates to my
refactored code.

* Remove old comment (#1518)

* Fix type signature and return (#1518)

* Update per code review (#1518)

Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
2022-08-11 12:38:28 -04:00
Emma Nechamkin
baa591a6c6 first run through 2022-08-11 12:33:46 -04:00
Emma Nechamkin
4f6a1b5286 added indoor plumbing to score housing burden 2022-08-11 12:33:46 -04:00
Emma Nechamkin
15450cf91f added indoor plumbing to score housing burden 2022-08-11 12:33:46 -04:00
Emma Nechamkin
8c7519063a added indoor plumbing to chas 2022-08-11 12:33:46 -04:00
Emma Nechamkin
0d90ae563a Changing LHE in tiles to a boolean (#1767)
also includes merging / clean up of the release
2022-08-11 12:33:46 -04:00
Emma Nechamkin
b0a728437c adds UST indicator (#1786)
adds leaky underground storage tanks
2022-08-11 12:33:46 -04:00
Emma Nechamkin
f6efdd4e14 Rescaling linguistic isolation (#1750)
Rescales linguistic isolation to drop puerto rico
2022-08-11 12:33:46 -04:00
Emma Nechamkin
2ab24c60fa updating ejscreen data, try two (#1747) 2022-08-11 12:33:46 -04:00
Shelby Switzer
3071815158 Do not drop Guam and USVI from ETL (#1681)
* Remove code that drops Guam and USVI from ETL

* Add back code for dropping rows by FIPS code

We may want this functionality, so let's keep it and just make the constant currently be an empty array.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
2022-08-11 12:33:46 -04:00
Shelby Switzer
05748c9fa2 Update backend for Puerto Rico (#1686)
* Update PR threshold count to 10

We now show 10 indicators for PR. See the discussion on the github issue for more info: https://github.com/usds/justice40-tool/issues/1621

* Do not use linguistic iso for Puerto Rico

Closes 1350.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
2022-08-11 12:33:46 -04:00
Emma Nechamkin
1782d022a9 Adding HOLC indicator (#1579)
Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.
2022-08-11 12:33:46 -04:00
Emma Nechamkin
f047ca9d83 Imputing income using geographic neighbors (#1559)
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
2022-08-11 12:33:45 -04:00
Jorge Escobar
1c448a77f9
NRI dataset and initial score YAML configuration (#1534)
* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* update be staging gha

* checkpoint

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* checkpoint

* PR Review

* renoving source url

* tests

* stop execution of ETL if there's a YAML schema issue

* update be staging gha

* adding source url as class var again

* clean up

* force cache bust

* gha cache bust

* dynamically set score vars from YAML

* docsctrings

* removing last updated year - optional reverse percentile

* passing tests

* sort order

* column ordening

* PR review

* class level vars

* Updating DatasetsConfig

* fix pylint errors

* moving metadata hint back to code

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2022-08-09 16:37:10 -04:00
Jorge Escobar
781e08f559
added tribalId for Supplemental dataset (#1804) 2022-08-08 17:42:14 -04:00
Jorge Escobar
8149ac31c5
Starting Tribal Boundaries Work (#1736)
* starting tribal pr

* further pipeline work

* bia merge working

* alaska villages and tribal geo generate

* tribal folders

* adding data full run

* tile generation

* tribal tile deploy
2022-07-30 01:13:10 -04:00
Vim
e1a61faf5d
Add a react component generator (#1745)
* Add a react component generator

* Update markdown links

* Change commented code to block comment
2022-07-15 09:54:58 -07:00
Jorge Escobar
2af6fca98d
Columnn headers update (#1618)
* Columnn headers update

* passing tests

* updated date stamp

* js tests
2022-05-06 14:10:15 -04:00
Emma Nechamkin
ae725f0a3e
arcgis column name fix (#1581)
eliminates duplicate column and ensures all column names are unique.
2022-04-22 14:09:12 -04:00
Jorge Escobar
fbd56e3bd5
Put the pdf back in the package and add TSD to pipeline (#1580)
* Put the pdf back in the package and add TSD to pipeline

* updated pdf with logo

* wrong path
2022-04-21 13:42:04 -04:00
Emma Nechamkin
2ce4cfe80e
updated with codebook (#1573) 2022-04-18 18:12:18 -04:00
Jorge Escobar
859177a877
Marshmallow Schemas for YAML files (#1497)
* Marshmallow Schemas for YAML files

* completed ticket

* passing tests

* lint

* click dep

* staging BE map

* Pr review
2022-03-31 13:56:10 -04:00
Emma Nechamkin
2628afacf9
Creating a data dictionary for the download packet (#1469)
Adding automated codebook creation. Future ticket to refactor.
2022-03-30 11:01:43 -04:00
Emma Nechamkin
dc981919f1
Adding booleans for FE to display (#1393)
PR adds booleans for each individual threshold category for the front end to display.
2022-03-29 20:17:10 -04:00
Emma Nechamkin
0c07cdac55
Adding category count to BE signals (#1486)
Added category count to downloadable data and backend signals.
2022-03-29 17:11:57 -04:00
Jorge Escobar
dd723b6c19
PyPi Packaging of Data Pipeline (#1464)
* PyPi Packaging of Data Pipeline

* package rename

* adding python version

* trigger data checks

* print env vars

* python version 2

* trigger data check

* python version 3

* update caching for other GHAs
2022-03-21 18:55:15 -04:00
Katherine D. Mlika
68c882b3de
updating column E label to "Identified as disadvantaged" (#1406)
* updating column E label to "Identified as disadvantaged"

* passing tests

* adding cached poetry flow

* working dir

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2022-03-18 14:50:03 -04:00
Jorge Escobar
7b05ee9c76
S3 Parallel Upload and Deletions (#1410)
* installation step

* trigger action

* installing to home dir

* dry-run

* pyenv

* py 2.8

* trying s4cmd

* removing pyenv

* poetry s4cmd

* num-threads

* public read

* poetry cache

* s4cmd all around

* poetry cache

* poetry cache

* install poetry packages

* poetry echo

* let's do this

* s4cmd install on run

* s4cmd

* ad aws back

* add aws back

* testing census api key and poetry caching

* census api key

* census api

* census api key #3

* 250

* poetry update

* poetry change

* check census api key

* force flag

* update score gen and tilefy; remove cached fips

* small gdal update

* invalidation

* missing cache ids
2022-03-17 23:19:23 -04:00
Emma Nechamkin
e7c7c0abeb
Updating higher education to be reversed (#1387)
Summary In this PR, we create a new variable so that the % college students is expressed as % not college students. This means that the front end can display % not college students.

Includes old variables so that this will not break fe.
2022-03-15 16:43:32 -04:00
Jorge Escobar
7f91e2b06b
ArcGIS zipping (#1391)
* ArcGIS zipping

* lint

* shapefile zip

* removing space in GMT

* adding shapefile to be staging gha
2022-03-09 18:00:20 -05:00
Emma Nechamkin
917b84dc2e
WY tracts are not showing up until zoom >7 (#1342)
In order to solve an issue where states with few census tracts appear to have no DACs, we change the low-zoom for states with under some threshold of tracts to be the high-zoom for those states. Thus, WY now has DACs even in low zoom. Yay!
2022-03-08 17:33:11 -05:00
Jorge Escobar
6425beb9f4
YAML Config for Downloadable Assets (#1252)
* starting yaml config load work

* working version for downloadable file

* yaml file update

* checkpoint

* sort if needed

* refactoring

* moving config

* checkpoint

* old files

* skipping downloadble tests for now

* more modularization

* more refactor, new excel yml

* pylint

* completed tabs

* Update excel.yml

* remvoing obsolete tests

* addressing PR feedback

* addressing changes

* confirmed change in yaml breaks tests

* safety bump

* PR review

* adding tests back

* pylint

* Incorporating latest score fields from Emma

* incorporating newest fields from Emma

* passing tests

* adding shapefile aws sync

* missing test

* passing tests
2022-03-04 15:02:09 -05:00
Emma Nechamkin
1f5633ef74
Adding constants for front end to display booleans (#1348)
Added constants for the threshold categories and socioeconomic indicators for front end.
2022-03-02 17:12:28 -05:00