Commit graph

103 commits

Author SHA1 Message Date
Matt Bowen
97e17546cc Refactor DOE Energy Burden and COI to use YAML (#1796)
* added tribalId for Supplemental dataset (#1804)

* Setting zoom levels for tribal map (#1810)

* NRI dataset and initial score YAML configuration (#1534)

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* update be staging gha

* checkpoint

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* checkpoint

* PR Review

* renoving source url

* tests

* stop execution of ETL if there's a YAML schema issue

* update be staging gha

* adding source url as class var again

* clean up

* force cache bust

* gha cache bust

* dynamically set score vars from YAML

* docsctrings

* removing last updated year - optional reverse percentile

* passing tests

* sort order

* column ordening

* PR review

* class level vars

* Updating DatasetsConfig

* fix pylint errors

* moving metadata hint back to code

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>

* Correct copy typo (#1809)

* Add basic test suite for COI (#1518)

* Update COI to use new yaml (#1518)

* Add tests for DOE energy budren (1518

* Add dataset config for energy budren (1518)

* Refactor ETL to use datasets.yml (#1518)

* Add fake GEOIDs to COI tests (#1518)

* Refactor _setup_etl_instance_and_run_extract to base (#1518)

For the three classes we've done so far, a generic
_setup_etl_instance_and_run_extract will work fine, for the moment we
can reuse the same setup method until we decide future classes need more
flexibility --- but they can also always subclass so...

* Add output-path tests (#1518)

* Update YAML to match constant (#1518)

* Don't blindly set float format (#1518)

* Add defaults for extract (#1518)

* Run YAML load on all subclasses (#1518)

* Update description fields (#1518)

* Update YAML per final format (#1518)

* Update fixture tract IDs (#1518)

* Update base class refactor (#1518)

Now that NRI is final I needed to make a small number of updates to my
refactored code.

* Remove old comment (#1518)

* Fix type signature and return (#1518)

* Update per code review (#1518)

Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
Co-authored-by: Vim <86254807+vim-usds@users.noreply.github.com>
2022-08-11 12:38:28 -04:00
Emma Nechamkin
baa591a6c6 first run through 2022-08-11 12:33:46 -04:00
Emma Nechamkin
15450cf91f added indoor plumbing to score housing burden 2022-08-11 12:33:46 -04:00
Emma Nechamkin
0d90ae563a Changing LHE in tiles to a boolean (#1767)
also includes merging / clean up of the release
2022-08-11 12:33:46 -04:00
Emma Nechamkin
b0a728437c adds UST indicator (#1786)
adds leaky underground storage tanks
2022-08-11 12:33:46 -04:00
Emma Nechamkin
f6efdd4e14 Rescaling linguistic isolation (#1750)
Rescales linguistic isolation to drop puerto rico
2022-08-11 12:33:46 -04:00
Emma Nechamkin
2ab24c60fa updating ejscreen data, try two (#1747) 2022-08-11 12:33:46 -04:00
Shelby Switzer
3071815158 Do not drop Guam and USVI from ETL (#1681)
* Remove code that drops Guam and USVI from ETL

* Add back code for dropping rows by FIPS code

We may want this functionality, so let's keep it and just make the constant currently be an empty array.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
2022-08-11 12:33:46 -04:00
Shelby Switzer
05748c9fa2 Update backend for Puerto Rico (#1686)
* Update PR threshold count to 10

We now show 10 indicators for PR. See the discussion on the github issue for more info: https://github.com/usds/justice40-tool/issues/1621

* Do not use linguistic iso for Puerto Rico

Closes 1350.

Co-authored-by: Shelby Switzer <shelbyswitzer@gmail.com>
2022-08-11 12:33:46 -04:00
Emma Nechamkin
1782d022a9 Adding HOLC indicator (#1579)
Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.
2022-08-11 12:33:46 -04:00
Emma Nechamkin
f047ca9d83 Imputing income using geographic neighbors (#1559)
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
2022-08-11 12:33:45 -04:00
Jorge Escobar
1c448a77f9
NRI dataset and initial score YAML configuration (#1534)
* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* update be staging gha

* checkpoint

* update be staging gha

* NRI dataset and initial score YAML configuration

* checkpoint

* adding data checks for release branch

* passing tests

* adding INPUT_EXTRACTED_FILE_NAME to base class

* lint

* columns to keep and tests

* checkpoint

* PR Review

* renoving source url

* tests

* stop execution of ETL if there's a YAML schema issue

* update be staging gha

* adding source url as class var again

* clean up

* force cache bust

* gha cache bust

* dynamically set score vars from YAML

* docsctrings

* removing last updated year - optional reverse percentile

* passing tests

* sort order

* column ordening

* PR review

* class level vars

* Updating DatasetsConfig

* fix pylint errors

* moving metadata hint back to code

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2022-08-09 16:37:10 -04:00
Jorge Escobar
2af6fca98d
Columnn headers update (#1618)
* Columnn headers update

* passing tests

* updated date stamp

* js tests
2022-05-06 14:10:15 -04:00
Emma Nechamkin
ae725f0a3e
arcgis column name fix (#1581)
eliminates duplicate column and ensures all column names are unique.
2022-04-22 14:09:12 -04:00
Jorge Escobar
fbd56e3bd5
Put the pdf back in the package and add TSD to pipeline (#1580)
* Put the pdf back in the package and add TSD to pipeline

* updated pdf with logo

* wrong path
2022-04-21 13:42:04 -04:00
Emma Nechamkin
2ce4cfe80e
updated with codebook (#1573) 2022-04-18 18:12:18 -04:00
Jorge Escobar
859177a877
Marshmallow Schemas for YAML files (#1497)
* Marshmallow Schemas for YAML files

* completed ticket

* passing tests

* lint

* click dep

* staging BE map

* Pr review
2022-03-31 13:56:10 -04:00
Emma Nechamkin
2628afacf9
Creating a data dictionary for the download packet (#1469)
Adding automated codebook creation. Future ticket to refactor.
2022-03-30 11:01:43 -04:00
Emma Nechamkin
dc981919f1
Adding booleans for FE to display (#1393)
PR adds booleans for each individual threshold category for the front end to display.
2022-03-29 20:17:10 -04:00
Emma Nechamkin
0c07cdac55
Adding category count to BE signals (#1486)
Added category count to downloadable data and backend signals.
2022-03-29 17:11:57 -04:00
Katherine D. Mlika
68c882b3de
updating column E label to "Identified as disadvantaged" (#1406)
* updating column E label to "Identified as disadvantaged"

* passing tests

* adding cached poetry flow

* working dir

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2022-03-18 14:50:03 -04:00
Emma Nechamkin
e7c7c0abeb
Updating higher education to be reversed (#1387)
Summary In this PR, we create a new variable so that the % college students is expressed as % not college students. This means that the front end can display % not college students.

Includes old variables so that this will not break fe.
2022-03-15 16:43:32 -04:00
Jorge Escobar
7f91e2b06b
ArcGIS zipping (#1391)
* ArcGIS zipping

* lint

* shapefile zip

* removing space in GMT

* adding shapefile to be staging gha
2022-03-09 18:00:20 -05:00
Emma Nechamkin
917b84dc2e
WY tracts are not showing up until zoom >7 (#1342)
In order to solve an issue where states with few census tracts appear to have no DACs, we change the low-zoom for states with under some threshold of tracts to be the high-zoom for those states. Thus, WY now has DACs even in low zoom. Yay!
2022-03-08 17:33:11 -05:00
Jorge Escobar
6425beb9f4
YAML Config for Downloadable Assets (#1252)
* starting yaml config load work

* working version for downloadable file

* yaml file update

* checkpoint

* sort if needed

* refactoring

* moving config

* checkpoint

* old files

* skipping downloadble tests for now

* more modularization

* more refactor, new excel yml

* pylint

* completed tabs

* Update excel.yml

* remvoing obsolete tests

* addressing PR feedback

* addressing changes

* confirmed change in yaml breaks tests

* safety bump

* PR review

* adding tests back

* pylint

* Incorporating latest score fields from Emma

* incorporating newest fields from Emma

* passing tests

* adding shapefile aws sync

* missing test

* passing tests
2022-03-04 15:02:09 -05:00
Emma Nechamkin
1f5633ef74
Adding constants for front end to display booleans (#1348)
Added constants for the threshold categories and socioeconomic indicators for front end.
2022-03-02 17:12:28 -05:00
Emma Nechamkin
aea49cbb5a
Cleaning up quick code (#1349)
Did some quick, mostly cosmetic changes and updates to the quick launch changes. This mostly entailed changing strings to constants and cleaning up some code to make it neater.

Changes -- PR AMI, updating ag loss, and dropping pr from some threshold counts.
2022-03-02 16:50:04 -05:00
Jorge Escobar
dac8ed29d5
Removing PDF from packet (#1306) 2022-03-01 13:41:44 -05:00
Emma Nechamkin
fab828dc66
Updating tiles csv to include state code (#1272)
Adding state codes for island areas and puerto rico to the tiles csv.
2022-02-25 11:10:09 -05:00
Emma Nechamkin
f0a4e40a79
Creating shapefiles for ArcGIS users (#1275)
Added shapefiles to the files generated when the pipeline is run. Produces both shapefile and a key for column names.
2022-02-24 10:32:49 -05:00
Lucas Merrill Brown
6e64134dc6
1295-college-attendance-field (#1297)
Lucas' work. Adding college attendance to tiles.
2022-02-17 19:50:52 -05:00
Emma Nechamkin
1b76a68838
FEMA data check (#1270)
we wanted to implement a slightly different FEMA AG LOSS indicator. Here, we take the 90th percentile only of tracts that have agvalue, and then we also floor the denominator of the rate calculation (loss/total value) at $408k
2022-02-17 16:53:04 -05:00
Vim
f90125d1b4
Update side panel to 3-state design (#1276)
* Update field name to follow constant standard

* Add table to ETL commands to README

* Update Generate Map Tiles run time

* Add a comma to copy

* Add 3 state UI experience

- PR will only show workforce dev
- IA will only show workforce dev w/o linguistic iso
- update tests to tests 3 states
- change state to territory for Island Areas

* Modify PR and IA threshold counts

* Update tile_data_expected.pkl file
2022-02-16 14:24:35 -08:00
Lucas Merrill Brown
a0d6e55f0a
Run ETL processes in parallel (#1253)
* WIP on parallelizing

* switching to get_tmp_path for nri

* switching to get_tmp_path everywhere necessary

* fixing linter errors

* moving heavy ETLs to front of line

* add hold

* moving cdc places up

* removing unnecessary print

* moving h&t up

* adding parallel to geo post

* better census labels

* switching to concurrent futures

* fixing output
2022-02-11 14:04:53 -05:00
Emma Nechamkin
389eb59ac4
Adding island area indicators to the tiles (#1213)
This updates the backend to produce tile data with island indicators / island fields. 

Contains:
- new tile codes for island data
- threshold column that specifies number of thresholds to show
- ui experience column that specifies which ui experience to show

TODO: Drop the logger info message from main :)
2022-02-09 20:33:42 -05:00
Emma Nechamkin
b86450c72b
Remove USVI and Guam territories from data and include/show on map American Samoa and Mariana Islands (#1248)
This updates the tile data so that guam and usvi do not appear in the tiles csv, from issue 1003
2022-02-09 15:23:37 -05:00
Lucas Merrill Brown
43e005cc10
Issue 1075: Add refactored ETL tests to NRI (#1088)
* Adds a substantially refactored ETL test to the National Risk Index, to be used as a model for other tests
2022-02-08 19:05:32 -05:00
Jorge Escobar
f5fe8d90e2
Excel formatting and tract id ordering (#1172)
* excel formatting and tract id ordering

* lint

* lint try $2

* lint 3

* addressed comments

* typo
2022-02-04 18:35:45 -05:00
Emma Nechamkin
49868401be
Updating field names to match score M definitions (#1190)
When implementing definition M for the score, the variable names were not yet updated. For example:

This legacy field naming: 
```
UNEMPLOYMENT_LOW_HS_EDUCATION_FIELD = (
    f"Greater than or equal to the {PERCENTILE}th percentile for unemployment"
    " and has low HS education"
)
``` 

Should actually be renamed something like this:
```
UNEMPLOYMENT_LOW_HS_LOW_HIGHER_ED_FIELD = (
    f"Greater than or equal to the {PERCENTILE}th percentile for unemployment"
    " and has low HS education and low higher ed attendance"
)
``` 

This PR is for the backend updates for this -- keeping the old fields, and adding new, Score M specific fields as listed below: 
- [x] `field_names`: add new fields to capture low_higher_ed
- [x] `score_m`: replace old fields with new fields 
- [x] `DOWNLOADABLE_SCORE_COLUMNS`: replace old fields with new fields
- [x] `TILES_SCORE_COLUMNS`: replace old fields with new fields
2022-02-01 18:54:43 -05:00
Jorge Escobar
2b35a8937a
Hot fix for Score M (#1182)
* fixes

* pr feedback

* tuple
2022-01-27 17:22:39 -05:00
Emma Nechamkin
4c7d729cf7
Issue 1140 loss rate rounding (#1170)
* updated loss rate rounding

* fixing a typo in variable name

* fixing typo in variable name

* oops, now ready to push

* updated pickle with float for loss rate columns

* updated a typo, now multiplies all loss rates by 100 consistent with other pcts

* updated with final pickles, all tests passing

* updated incorporating lucas pr comments

* changed literal to field name
2022-01-26 13:57:45 -05:00
Lucas Merrill Brown
18f299c5f8
Issue 1141: Definition M (#1151) 2022-01-18 14:56:55 -05:00
Jorge Escobar
d686bb856e
Download column order completed (#1077)
* Download column order completed

* Kameron changes

* Lucas and Beth column order changes

* cdc_places update

* passing score

* pandas error

* checkpoint

* score passing

* rounding complete - percentages still showing one decimal

* fixing tests

* fixing percentages

* updating comment

* int percentages! 🎉🎉

* forgot to pass back to df

* passing tests

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2022-01-13 15:04:16 -05:00
Saran Ahluwalia
56644698ff
Address rounding issue in Pandas series to floor numerically unstable values (#1085)
* wip - added tests - 1 failing

* added check for empty series + added test

* passing tests

* parallelism in variable assingnment choice

* resolve merge conflicts

* variable name changes

* cleanup logic and move comments out of main code execution + add one more test for an extreme example eith -np.inf

* cleanup logic and move comments out of main code execution + add one more test for an extreme example eith -np.inf

* revisions to handle type ambiguity

* fixing tests

* fix pytest

* fix linting

* fix pytest

* reword comments

* cleanup comments

* cleanup comments - fix typo

* added type check and corresponding test

* added type check and corresponding test

* language cleanup

* revert

* update picke fixture

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2022-01-05 17:03:37 -05:00
Lucas Merrill Brown
0d10534725
Issue 1044: Add low HS education fields to tiles and download (#1046) 2021-12-14 15:41:06 -05:00
Jorge Escobar
9709d08ca3
Update Side Panel Tile Data (#866)
* Update Side Panel Tile Data

* Update Side Panel Tile Data

* Correct indicator names to match csv

* Replace Score with Rate

* Comment out FEMA Loss Rate to troubleshoot

* Removes all "FEMA Loss Rate" array elements

* Revert FEMA to Score

* Remove expected loss rate

* Remove RMP and NPL from BASIC array

* Attempt to make shape mismatch align

- update README typo

* Add Score L indicators to TILE_SCORE_FLOAT_COLUMNS

* removing cbg references

* completes the ticket

* Update side panel fields

* Update index file writing to create parent dir

* Updates from linting

* fixing missing field_names for island territories 90th percentile fields

* Update downloadable fields and fix field name

* Update file fields and tests

* Update ordering of fields and leave TODO

* Update pickle after re-ordering of file

* fixing bugs in etl_score_geo

* Repeating index for diesel fix

* passing tests

* adding pytest.ini

Co-authored-by: Vim USDS <vimal.k.shah@omb.eop.gov>
Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-12-13 14:53:50 -05:00
Lucas Merrill Brown
7fcecaee42
Issue 970: reverse percentiles for AMI and life expectancy (#1018)
* switching to low

* fixing score-etl-post

* updating comments

* fixing comparison

* create separate field for clarity

* comment fix

* removing healthy food

* fixing bug in score post

* running black and adding comment

* Update pickles and add a helpful notes to README

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-12-10 10:16:22 -05:00
Lucas Merrill Brown
1a61026ecf
Issue 967: Calculate urban/rural percentiles (#1006) 2021-12-07 17:28:36 -05:00
Lucas Merrill Brown
5706837956
Add NATA cancer risk and respiratory hazard to definition L (#1001) 2021-12-07 12:45:45 -05:00
Lucas Merrill Brown
5a6d6d8557
Issue 954: Add various data sources from Child Opportunity Index (#986)
* Adds four fields:
    * Summer days above 90F
    * Percent low access to healthy food
    * Percent impenetrable surface areas
    * Low third grade reading proficiency

* Each of these four gets added into Definition L in various factors.

* Additionally, I add college attendance fields to the ETL for Census ACS.

* This PR also introduces the notion of "reverse percentiles", relevant to ticket #970.
2021-12-07 11:33:49 -05:00