Commit graph

218 commits

Author SHA1 Message Date
Emma Nechamkin
2279a04c94
Quick fix: updating snapshots to have more sigfigs (#1409)
Updated snapshots to include 10 digits after the decimal
2022-03-14 21:44:35 -04:00
Emma Nechamkin
9d920d4db4
Updating testing to include pytest-snapshot (#1355)
In this commit, we slightly change the testing to use `pytest-snapshot`. This is for `ETL`s only.
2022-03-11 21:34:07 -05:00
Jorge Escobar
7f91e2b06b
ArcGIS zipping (#1391)
* ArcGIS zipping

* lint

* shapefile zip

* removing space in GMT

* adding shapefile to be staging gha
2022-03-09 18:00:20 -05:00
Jorge Escobar
1730572aa6
Reducing Docker start up and adding ArcGIS URL (#1386)
* Reducing Docker start up and adding ArcGIS URL

* Updating ArcGIS URLs
2022-03-09 08:55:17 -05:00
Emma Nechamkin
917b84dc2e
WY tracts are not showing up until zoom >7 (#1342)
In order to solve an issue where states with few census tracts appear to have no DACs, we change the low-zoom for states with under some threshold of tracts to be the high-zoom for those states. Thus, WY now has DACs even in low zoom. Yay!
2022-03-08 17:33:11 -05:00
Jorge Escobar
6425beb9f4
YAML Config for Downloadable Assets (#1252)
* starting yaml config load work

* working version for downloadable file

* yaml file update

* checkpoint

* sort if needed

* refactoring

* moving config

* checkpoint

* old files

* skipping downloadble tests for now

* more modularization

* more refactor, new excel yml

* pylint

* completed tabs

* Update excel.yml

* remvoing obsolete tests

* addressing PR feedback

* addressing changes

* confirmed change in yaml breaks tests

* safety bump

* PR review

* adding tests back

* pylint

* Incorporating latest score fields from Emma

* incorporating newest fields from Emma

* passing tests

* adding shapefile aws sync

* missing test

* passing tests
2022-03-04 15:02:09 -05:00
Emma Nechamkin
1f5633ef74
Adding constants for front end to display booleans (#1348)
Added constants for the threshold categories and socioeconomic indicators for front end.
2022-03-02 17:12:28 -05:00
Emma Nechamkin
aea49cbb5a
Cleaning up quick code (#1349)
Did some quick, mostly cosmetic changes and updates to the quick launch changes. This mostly entailed changing strings to constants and cleaning up some code to make it neater.

Changes -- PR AMI, updating ag loss, and dropping pr from some threshold counts.
2022-03-02 16:50:04 -05:00
Emma Nechamkin
f9be97d8c8
This is a quick addition to include PR AMI. To be revised in the "clean up code" pr 2022-03-01 16:31:38 -05:00
Jorge Escobar
dac8ed29d5
Removing PDF from packet (#1306) 2022-03-01 13:41:44 -05:00
Emma Nechamkin
fab828dc66
Updating tiles csv to include state code (#1272)
Adding state codes for island areas and puerto rico to the tiles csv.
2022-02-25 11:10:09 -05:00
Emma Nechamkin
f0a4e40a79
Creating shapefiles for ArcGIS users (#1275)
Added shapefiles to the files generated when the pipeline is run. Produces both shapefile and a key for column names.
2022-02-24 10:32:49 -05:00
Lucas Merrill Brown
6e64134dc6
1295-college-attendance-field (#1297)
Lucas' work. Adding college attendance to tiles.
2022-02-17 19:50:52 -05:00
Emma Nechamkin
cee13b50cc
Stripping thresholds from PR so the UI matches the count
Add a tuple to skip FIPS 72 when incrementing counter. TODO: clean up so it's a constant.
2022-02-17 16:54:33 -05:00
Emma Nechamkin
1b76a68838
FEMA data check (#1270)
we wanted to implement a slightly different FEMA AG LOSS indicator. Here, we take the 90th percentile only of tracts that have agvalue, and then we also floor the denominator of the rate calculation (loss/total value) at $408k
2022-02-17 16:53:04 -05:00
Vim
f90125d1b4
Update side panel to 3-state design (#1276)
* Update field name to follow constant standard

* Add table to ETL commands to README

* Update Generate Map Tiles run time

* Add a comma to copy

* Add 3 state UI experience

- PR will only show workforce dev
- IA will only show workforce dev w/o linguistic iso
- update tests to tests 3 states
- change state to territory for Island Areas

* Modify PR and IA threshold counts

* Update tile_data_expected.pkl file
2022-02-16 14:24:35 -08:00
Jorge Escobar
59862a098e
Test Staging Data Backend (#1282)
* Test Staging Data Backend

* action updates
2022-02-16 16:45:59 -05:00
Jorge Escobar
82809a5123
Github Actions for Staging Backend (#1281)
* Github Actions for Staging Backend

* trigger run
2022-02-16 16:40:25 -05:00
Lucas Merrill Brown
3e37d9d1a3
Issue 1075: update snapshots using command-line flag (#1249)
* Adding skippable tests using command-line flag
2022-02-14 12:16:52 -05:00
Lucas Merrill Brown
a0d6e55f0a
Run ETL processes in parallel (#1253)
* WIP on parallelizing

* switching to get_tmp_path for nri

* switching to get_tmp_path everywhere necessary

* fixing linter errors

* moving heavy ETLs to front of line

* add hold

* moving cdc places up

* removing unnecessary print

* moving h&t up

* adding parallel to geo post

* better census labels

* switching to concurrent futures

* fixing output
2022-02-11 14:04:53 -05:00
Emma Nechamkin
389eb59ac4
Adding island area indicators to the tiles (#1213)
This updates the backend to produce tile data with island indicators / island fields. 

Contains:
- new tile codes for island data
- threshold column that specifies number of thresholds to show
- ui experience column that specifies which ui experience to show

TODO: Drop the logger info message from main :)
2022-02-09 20:33:42 -05:00
Emma Nechamkin
b86450c72b
Remove USVI and Guam territories from data and include/show on map American Samoa and Mariana Islands (#1248)
This updates the tile data so that guam and usvi do not appear in the tiles csv, from issue 1003
2022-02-09 15:23:37 -05:00
Lucas Merrill Brown
43e005cc10
Issue 1075: Add refactored ETL tests to NRI (#1088)
* Adds a substantially refactored ETL test to the National Risk Index, to be used as a model for other tests
2022-02-08 19:05:32 -05:00
Jorge Escobar
f5fe8d90e2
Excel formatting and tract id ordering (#1172)
* excel formatting and tract id ordering

* lint

* lint try $2

* lint 3

* addressed comments

* typo
2022-02-04 18:35:45 -05:00
Emma Nechamkin
6a00b29f5d
Adding VA and CO ETL from mapping for environmental justice (#1177)
Adding the mapping for environmental justice data, which contains information about VA and CO, to the ETL pipeline.
2022-02-04 10:00:41 -05:00
Jorge Escobar
1d399d3ca9
Tox Security Fix (#1242)
* checkpoint

* safety ignore

* update python matrix for data checks

* downloading census once
2022-02-03 17:05:51 -05:00
Emma Nechamkin
49868401be
Updating field names to match score M definitions (#1190)
When implementing definition M for the score, the variable names were not yet updated. For example:

This legacy field naming: 
```
UNEMPLOYMENT_LOW_HS_EDUCATION_FIELD = (
    f"Greater than or equal to the {PERCENTILE}th percentile for unemployment"
    " and has low HS education"
)
``` 

Should actually be renamed something like this:
```
UNEMPLOYMENT_LOW_HS_LOW_HIGHER_ED_FIELD = (
    f"Greater than or equal to the {PERCENTILE}th percentile for unemployment"
    " and has low HS education and low higher ed attendance"
)
``` 

This PR is for the backend updates for this -- keeping the old fields, and adding new, Score M specific fields as listed below: 
- [x] `field_names`: add new fields to capture low_higher_ed
- [x] `score_m`: replace old fields with new fields 
- [x] `DOWNLOADABLE_SCORE_COLUMNS`: replace old fields with new fields
- [x] `TILES_SCORE_COLUMNS`: replace old fields with new fields
2022-02-01 18:54:43 -05:00
Jorge Escobar
403a490985
Esfoobar usds/488 generate score per commit pr (#1211)
* Score run on every commit to data PR

* testing score run

* source aws
2022-01-31 16:07:21 -05:00
dependabot[bot]
8b72f743e3
Bump pillow from 8.4.0 to 9.0.0 in /data/data-pipeline (#1136)
* Bump pillow from 8.4.0 to 9.0.0 in /data/data-pipeline

Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.4.0 to 9.0.0.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/8.4.0...9.0.0)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* pillow bump

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
2022-01-27 18:19:49 -05:00
dependabot[bot]
4a83ae458e
Bump ipython from 7.28.0 to 7.31.1 in /data/data-pipeline (#1169)
Bumps [ipython](https://github.com/ipython/ipython) from 7.28.0 to 7.31.1.
- [Release notes](https://github.com/ipython/ipython/releases)
- [Commits](https://github.com/ipython/ipython/compare/7.28.0...7.31.1)

---
updated-dependencies:
- dependency-name: ipython
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-01-27 17:36:14 -05:00
Jorge Escobar
2b35a8937a
Hot fix for Score M (#1182)
* fixes

* pr feedback

* tuple
2022-01-27 17:22:39 -05:00
Emma Nechamkin
4c7d729cf7
Issue 1140 loss rate rounding (#1170)
* updated loss rate rounding

* fixing a typo in variable name

* fixing typo in variable name

* oops, now ready to push

* updated pickle with float for loss rate columns

* updated a typo, now multiplies all loss rates by 100 consistent with other pcts

* updated with final pickles, all tests passing

* updated incorporating lucas pr comments

* changed literal to field name
2022-01-26 13:57:45 -05:00
Lucas Merrill Brown
18f299c5f8
Issue 1141: Definition M (#1151) 2022-01-18 14:56:55 -05:00
Saran Ahluwalia
a07bf752b0
Notebook investigating NHPD as a source for providing contemporary foreclosure data (#1012)
Co-authored-by: Saran Ahluwalia <sarahluw@cisco.com>
2022-01-18 13:08:27 -05:00
Saran Ahluwalia
87e08f5fe1
CDC SVI Index: Additions to data-pipeline and comparison tool (#1096)
* wip

* working

* working

* rename

* documentation

* add link

* add readme

* update fieldnames

* typo

* add comparison tool

* revise wording

* variable change for FIPS

* workding

* wording in readme

* cleanup wording

* update comparison tool

* final tune up

* grammar and punctuation in the documentation

* period

* cleanup comments

* added revisions

* parallelism

* PR feedback from Lucas

* remove extraneous fields from comparison tool

* style

* updates

* remove themes

* formatting

* remove referenes to percentile rank

* remove referenes to percentile rank

* typo in fieldnames

* updates based on feedback from Lucas

* fieldnames formatting

* fix broken markdown link

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2022-01-14 14:52:37 -05:00
Saran Ahluwalia
95a14adb35
Added Census Tract Aggregated Micro-data from EPA Risk-Screening Environmental Indicators (RSEI) model (#1101)
* added initial source code - todo is comparison tool

* added values

* rename fields

* check geoid

* added black

* added revisions

* added clean up to comments

* more comments

* formatting

* cleanup and address PR feedback

* fix changes

* final path changes

* style

* PR feedback

* added final PR comment

* fix flake 8

* add revisions
2022-01-14 13:50:49 -05:00
Saran Ahluwalia
a98ea35f74
Maryland EJSCREEN Addition to comparison tool (#1143)
* finalized

* cleanup notebook

* cleanup

* run black
2022-01-14 13:26:48 -05:00
Saran Ahluwalia
2604b66cf7
Fix errors and improve code quality and readability in Health Scores (#1147)
* run black on health_score.py

* to_numpy() versus values - see https://pandas.pydata.org/pandas-docs/version/0.24.0rc1/api/generated/pandas.Series.to_numpy.html
2022-01-14 13:11:47 -05:00
Jorge Escobar
d686bb856e
Download column order completed (#1077)
* Download column order completed

* Kameron changes

* Lucas and Beth column order changes

* cdc_places update

* passing score

* pandas error

* checkpoint

* score passing

* rounding complete - percentages still showing one decimal

* fixing tests

* fixing percentages

* updating comment

* int percentages! 🎉🎉

* forgot to pass back to df

* passing tests

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2022-01-13 15:04:16 -05:00
Saran Ahluwalia
98ff4bd9d8
Add experimental Jupyter notebook with Health Scoring Methodology Example for Health Scores (#989)
Co-authored-by: Saran Ahluwalia <sarahluw@cisco.com>
2022-01-13 14:43:27 -05:00
Shaun Verch
4cec1bb37e
Install and run pandas-vet (#1119)
* Install and run pandas-vet

This doesn't fix the errors, but it can give us a starting point for the
discussion of which of these errors we care about.

* Ignore the errors for now

* Ignore eeoc.gov in link checker

Sometimes it seems down from the perspective of github actions.
2022-01-13 13:17:30 -05:00
Shaun Verch
73d6aa937d
Add pyproject.toml to fix docker compose build (#1131)
* Add pyproject.toml to fix docker compose build

Even though we want to use locked dependencies, pyproject.toml is still
required.

* update Dockerfile

Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
2022-01-13 13:05:32 -05:00
Lucas Merrill Brown
114e6b765a
Issue 1129: remove deprecated field other_census_tract_fields_to_keep (#1130) 2022-01-12 10:16:09 -05:00
Shaun Verch
0abf04d6c2
Remove requirements.txt as a dependency (#1111)
* Remove requirements.txt as a dependency

This converts both docker and tox to use poetry, eliminating usage of
requirements.txt in both flows.

- In tox, uses the tox-poetry package which installs dependencies from
  the lockfile.
- In docker, uses
  https://stackoverflow.com/questions/53835198/integrating-python-poetry-with-docker
  as a reference.

* Don't copy pyproject.toml

* Remove obsoleted docs about requirements.txt

* Add --full-trace option to pytest

* Fix liccheck

liccheck works with requirements.txt, not with poetry, so there needs to
be an extra translation step.

* TEMP: Add WIP fix for pandas issue

This is just to see if the github actions would pass once this fix gets
merged, but it's being reviewed separately.

* Revert "TEMP: Add WIP fix for pandas issue"

This reverts commit 06e38e8cc77f5f3105c6e7a9449901db67aa1c82.
2022-01-10 16:43:56 -05:00
Saran Ahluwalia
56644698ff
Address rounding issue in Pandas series to floor numerically unstable values (#1085)
* wip - added tests - 1 failing

* added check for empty series + added test

* passing tests

* parallelism in variable assingnment choice

* resolve merge conflicts

* variable name changes

* cleanup logic and move comments out of main code execution + add one more test for an extreme example eith -np.inf

* cleanup logic and move comments out of main code execution + add one more test for an extreme example eith -np.inf

* revisions to handle type ambiguity

* fixing tests

* fix pytest

* fix linting

* fix pytest

* reword comments

* cleanup comments

* cleanup comments - fix typo

* added type check and corresponding test

* added type check and corresponding test

* language cleanup

* revert

* update picke fixture

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2022-01-05 17:03:37 -05:00
Shaun Verch
93595b7bb4
Re-export requirements.txt to fix version errors (#1099)
* Re-export requirements.txt to fix version errors

The version of lxml in this file had a known vulnerability that got
caught by the "safety" checker, but it is updated in the poetry files.

Regenerated using:
https://github.com/usds/justice40-tool/tree/main/data/data-pipeline#miscellaneous

* Fix lint error

* Run lint on all envs and add comments

* Ignore testst that fail lint because of dev deps

* Ignore medium.com in link checker

It's returning 403s to github actions...
2022-01-05 15:58:24 -05:00
Saran Ahluwalia
a4137fdc98
Add Michigan EJ Screen into data-pipeline's ETL and provide automated scoring and statistics outputs (#1091)
* draft wip

* initial commit

* clear output from notebook

* revert to 65ceb7900f

* draft wip

* initial commit

* clear output from notebook

* revert to 65ceb7900f

* make michigan prefix for readable

* standardize Michigan names and move all constants from class into field names module

* standardize Michigan names and move all constants from class into field names module

* include only pertinent columns for scoring comparison tool

* michigan EJSCREEN standardization

* final PR feedback

* added exposition and summary of Michigan EJSCREEN

* added exposition and summary of Michigan EJSCREEN

* fix typo

Co-authored-by: Saran Ahluwalia <ahlusar.ahluwalia@gmail.com>
2021-12-31 15:38:52 -05:00
Saran Ahluwalia
24f8eb93c4
Tree Equity Output: Change output from Geojson to CSV format for easier analysis (#1089)
Added Tree Equity

* draft wip

* revised documentation

* revised documentation

* revised documentation and defer to super

* change word in logger

* fix flake 8

* address nit

Co-authored-by: Saran Ahluwalia <ahlusar.ahluwalia@gmail.com>
2021-12-30 17:17:28 -05:00
Lucas Merrill Brown
beb0eea5cc
Alternative definition of DACs for comparison (#1068)
* Alternative energy-related definition of DACs
2021-12-27 12:05:59 -05:00
Kameron Kerger
e15bb52bad
548-update-pdf (#1081)
latest pdf copy with links now added for each data source
2021-12-21 14:12:20 -05:00