Commit graph

123 commits

Author SHA1 Message Date
Shelby Switzer
a0bf186ee6
Add percentile column for L (#851)
* Add percentile column for L

* Use Definition instead of Score

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-11-04 13:03:56 -04:00
Lucas Merrill Brown
8372b47d42
Various updates to Definition L (#850)
* removing percentiles as separate field names

* adding RMP
2021-11-04 12:17:45 -04:00
Lucas Merrill Brown
1d541be447
Add EJSCREEN Areas of Concern (#843)
* Adding ej screen areas of concern

* Uses it where user has local files, but not otherwise

Co-authored-by: VincentLaUSDS <vincent.la@omb.eop.gov>
2021-11-02 15:38:42 -04:00
Shelby Switzer
7bd1a9e59e
Big ole score refactor (#815)
* WIP

* Create ScoreCalculator

This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or  similar
classes.

* Add factor logic for score L

Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.

Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator

* Update thresholds and get score L calc working

* Update header name for consistency and update comparison tool

* Initial move of score to score calculator

* WIP big refactor

* Continued WIP on score refactor

* WIP score refactor

* Get to a working score-run

* Refactor to pass df to score init

This makes it easier to pass df around within a class with multiple
methods that require df.

* Updates from Black

* Updates from linting

* Use named imports instead of wildcard; log more

* Additional refactors

* move more field names to field_names constants file
* import constants without a relative path (would break docker)
* run linting
* raise error if add_columns is not implemented in a child class

* Refactor dict to namedtuple in score c

* Update L to use all percentile field

* change high school ed field in L back

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-11-02 14:12:53 -04:00
Jorge Escobar
1b17af84c8
Combine + Tilefy (#806)
* init

* score-post

* added score csv s3 download; remore poetry cmds from readme

* working census tile fetch

* PR review

* Github Actions Work
2021-11-01 18:05:05 -04:00
Shelby Switzer
7b87e0ec99
Add Score L (#812)
* Create ScoreCalculator

This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or  similar
classes.

* Add factor logic for score L

Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.

Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-10-28 16:07:41 -04:00
Jorge Escobar
a94b8e2761 final census GHA 2021-10-14 13:50:56 -04:00
Jorge Escobar
8ddfc6b305
Update application.py 2021-10-14 13:31:37 -04:00
Jorge Escobar
3b04356fb3
Data sources from S3 (#769)
* Started 535

* Data sources from S3

* lint

* renove breakpoints

* PR comments

* lint

* census data completed

* lint

* renaming data source
2021-10-13 16:00:33 -04:00
Billy Daly
d1273b63c5
Add ETL Contract Checks (#619)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline

* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for 
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly

* Adds test for ExtractTransformLoad.__init__() and base.py

* Fixes failing flake8 test

* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml

* Adds test for validate_output()

* Adds remaining tests

* Removes is_dataset from init method

* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a 
non-census dataset and removes the need to overwrite the value in 
mock_etl fixture

* Re-formats files with black and fixes broken tox tests
2021-10-13 15:54:15 -04:00
Shelby Switzer
d8c73e6a02
Change downloadable file names (#708)
* Change downloadable file names

* Remove constants because we're dynamically creating these
* Update to "communities" for the descriptor word based on team convo
* Add timestamp in 2020-09-20-0930 format because I personally think
this is the best ^.^

* Add a CLI command to run ETL Score Post so that we don't have to
  run the score generation just to get new downloadable files.
* Also make sure the old downloadable files are cleaned up on the
  run of this command.

* Remove unused library, thanks pylint!

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-10-01 15:04:37 -04:00
Jorge Escobar
2f8f2240b4
added new PDF file (#745) 2021-09-23 13:34:50 -04:00
Lucas Merrill Brown
b1a4d26be8
Adding persistent poverty tracts (#738)
* persistent poverty working

* fixing left-padding

* running black and adding persistent poverty to comp tool

* fixing bug

* running black and fixing linter

* fixing linter

* fixing linter error
2021-09-22 17:57:08 -04:00
Shelby Switzer
d3a18352fc
Add pytest to tox run in CI/CD (#713)
* Add pytest to tox run in CI/CD

* Try fixing tox dependencies for pytest

* update poetry to get ci/cd passing

* Run poetry export with --dev flag to include dev dependencies such as pytest

* WIP updating test fixtures to include PDF

* Remove dev dependencies from reqs and add pytest to envlist to make build faster

* passing score_post tests

* Add pytest tox (#729)

* Fix failing pytest

* Fixes failing tox tests and updates requirements.txt to include dev deps

* pickle protocol 4

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
Co-authored-by: Billy Daly <williamdaly422@gmail.com>
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
2021-09-22 13:47:37 -04:00
Vincent La
7709836a12
Ticket 355: Adding map to Urban vs Rural Census Tracts (#696)
* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* adding urban to comparison tool

* renaming file

* adding urban rural to more comp tool outputs

* updating requirements and poetry

* Adding ej screen notebook

* removing ej screen notebook since it's in justice40-tool-iss-719

Co-authored-by: La <ryy0@cdc.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-09-22 12:31:03 -04:00
Jorge Escobar
cd33f323c8
Revised Columns on Download File + PDF (#701)
* Revised Columns on Download File + PDF

* finishing ticket
2021-09-17 13:11:23 -04:00
Lucas Merrill Brown
a1a988da46
Minor updates to scoring comparison tool (#686)
* Formatting updates for output XLSX
2021-09-16 14:06:33 -05:00
Jorge Escobar
487f6a8e04
Score Indicators (#690)
* Score Indicators

* roudning issue with housing burden column

* switching out score g

* final list of columns

* removing duplicate housing burden percentile fields

* removing duplicate

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-09-16 10:53:05 -04:00
Lucas Merrill Brown
1c0d87d84b
Add FEMA risk index to score file (#687)
* Add to score file
2021-09-15 13:31:32 -05:00
Lucas Merrill Brown
e94d05882c
Issue 675 & 676: Adding life expectancy and DOE energy burden data (#683)
* Adding two new data sources.
2021-09-15 09:59:28 -05:00
Jorge Escobar
fc5ed37fca
dependabot bump pillow (#681)
* dependabot bump pillow

* updated poetry

* adding encoding to file open
2021-09-14 17:28:59 -04:00
Lucas Merrill Brown
52e70653f0
Prototype H (#682) 2021-09-14 16:16:41 -05:00
Jorge Escobar
5bd63c083b
Run all Census, ETL, Score, Combine and Tilefy in one command (#662)
* Run all Census, ETL, Score, Combine and Tilefy in one command

* docker cmd

* some docker improvements

* feedback updates

* lint
2021-09-14 14:15:34 -04:00
Lucas Merrill Brown
1083e953da
Prototype G (#672)
* wip

* cleanup

* cleanup 2

* fixing import ordering linter error

* updating backend to use score G

* adding percentile to score output

* update tippeanoe compression

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-09-14 10:48:11 -04:00
Jorge Escobar
879cb7d022
hotfix wrong score tile csv path (#671)
* hotfix wrong score tile csv path

* updating test

* forcing update

* triggering action
2021-09-14 07:27:48 -04:00
Lucas Merrill Brown
7d13be7651
Ticket 492: Integrate Area Median Income and Poverty measures into ETL (#660)
* Loading AMI and poverty data
2021-09-13 15:36:35 -05:00
Shelby Switzer
d7274888b6
Update downloadable zip file (#659)
* Update downloadable zip file

* Don't use spaces in the name, as per #620
* Add the score D columns, as per #596

* fix paths and directories in etl_score_post

while the tests seemed to  be passing, I encountered an error when
running poetry run score, which was caused by us creating a directory
called <name>.csv, instead of creating the parent directory.

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 16:06:47 -04:00
Nat Hillard
536a35d6a0
Data Unit Tests (#509)
* Fixes #341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit

* Location bug

* Adding Load tests

* Fixing XLSX filename

* Adding downloadable zip test

* updating pickle

* Fixing pylint warnings

* Updte readme to correct some typos and reorganize test content structure

* Removing unused schemas file, adding details to readme around pickles, per PR feedback

* Update test to pass with Score D added to score file; update path in readme

* fix requirements.txt after merge

* fix poetry.lock after merge

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 14:17:34 -04:00
Shelby Switzer
ac62933d16
Initial refactor for Score ETL (#618)
* WIP refactor

* Exract score calculations into their own methods

* do all initial df prep in single method

* Fix error in docs for running etl for single dataset

* WIP understanding HUD and linguistic iso data

* Add comments from initial group review on PR

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 10:34:34 -04:00
Jorge Escobar
470c474367
Updated README (#652)
* Updated README

- Added a link to the full score data set on S3
- Some Docker updates

* typo
2021-09-10 10:15:46 -04:00
Jorge Escobar
327e27e713
Add Score D to USA Low (#629)
* added score D

* Adding Score D to usa-low

* rounding score d

* small vscode update

* last couple of vscode changes

* uncommited bscode changes
2021-09-08 16:44:26 -04:00
Jorge Escobar
1953d2fcd8
Additional VSCode and Poetry tasks added (#624)
* additional tasks added

* Update launch.json
2021-09-08 14:54:38 -04:00
dependabot[bot]
f4ffcc6a53
Bump pillow from 8.3.1 to 8.3.2 in /data/data-pipeline (#625)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.3.1 to 8.3.2.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/8.3.1...8.3.2)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-08 13:08:58 -04:00
Billy Daly
f0900f7b69
Adds National Risk Index data to ETL pipeline (#549)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline
2021-09-07 20:51:34 -04:00
Jorge Escobar
94298635c2
Add to decimal rounding (#623)
* added score D

* forgot to add decimal rounding
2021-09-07 14:30:45 -04:00
Jorge Escobar
99503a2541
added score D (#621) 2021-09-07 13:37:16 -04:00
Vim
704831159f
dockerize front-end and pass env vars to npm build (#614)
* Revert "dockerize front end (#558)"

This reverts commit 89c23faf7a.

* dockerize frontend
- adds score server and website docker compose
- creates docker ignore
- adds .env.* for dev, prod and local
- adds dockerfile for website
- adds env to gatsby-config
- adds hostaddress to develop / start script
- adds istructions in README for running docker
- replaces fixed URLS with ones based on env vars
- creates a score server dockerfile

* updates README to change map tiles source

* adds env DATA_SOURCE:development to deploy GHA

* capitalize readme
2021-09-07 10:35:11 -07:00
Vim
f5c4ba6d88
Revert "dockerize front end (#558)" (#588)
This reverts commit 89c23faf7a (dockerize-front-end)
2021-09-01 13:00:26 -07:00
Vim
89c23faf7a
dockerize front end (#558)
* initial docker

* adds concurrency to be able to run yarn install

* adds 0.0.0.0 to allow docker access

* adds web service

* adds env variables

* updates root path

* adds volumes

* adds docker to readme

* adds score server client docker

* docker updates after convo

* speeds up build and removes env vars

* adds client as volume

* updates to docker setup

* checkpoint

* updates the docker file

* adds .env.* files

* replaces serve with http-server for cors

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-09-01 11:16:29 -07:00
Jorge Escobar
f5ba63977a
Hotfix for Readme and ACS File name (#563) 2021-08-24 17:01:12 -04:00
Lucas Merrill Brown
65ceb7900f
Score F, testing methodology (#510)
* fixing dependency issue

* fixing more dependencies

* including fraction of state AMI

* wip

* nitpick whitespace

* etl working now

* wip on scoring

* fix rename error

* reducing metrics

* fixing score f

* fixing readme

* adding dependency

* passing tests;

* linting/black

* removing unnecessary sample

* fixing error

* adding verify flag on etl/base

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-08-24 16:40:54 -04:00
Jorge Escobar
c24e13c930
Update GHA to push only client changes to S3 (#543) 2021-08-16 17:00:43 -04:00
Shelby Switzer
2c79396550
Initial draft for data provenance addition to README (#528)
* Initial draft for data provenance

We want to make the data usable/available at every step of our data
pipeline. This starts te addition to the README that spells out the data
provenance and where each version of the data as it goes through our
pipeline lives.

* Update README with placeholders for next steps in data provenance

* Add coming soon placeholders for remaining data locations

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-08-16 16:45:54 -04:00
Jorge Escobar
c19cd3ee55
hotfix on float cols (#526) 2021-08-13 15:48:31 -04:00
Vim
1dbb1018d6
sets column as percentiles (#525)
* sets column as percentiles

* adds trailing comma
2021-08-13 12:01:34 -07:00
Jorge Escobar
773c035493
AWS Sync Public Read (#508)
* adding layer to mvts

* small fix for GHA

* AWS Sync Public Read

* removed temp file

* updated state media income ftp
2021-08-12 14:17:25 -04:00
dependabot[bot]
1c5d5de82b
Bump yamale from 3.0.6 to 3.0.8 in /data/data-roadmap (#506)
Bumps [yamale](https://github.com/23andMe/Yamale) from 3.0.6 to 3.0.8.
- [Release notes](https://github.com/23andMe/Yamale/releases)
- [Commits](https://github.com/23andMe/Yamale/compare/3.0.6...3.0.8)

---
updated-dependencies:
- dependency-name: yamale
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-12 12:50:26 -04:00
Jorge Escobar
d259d97ba9
adding layer to mvts (#503)
* adding layer to mvts

* small fix for GHA
2021-08-12 10:56:54 -04:00
Jorge Escobar
6dc1283ee2 added comment 2021-08-10 15:37:36 -04:00
Jorge Escobar
3d8dbb293c
Tile-baking columns with floating rounds completed (#491)
* Tile-baking columns with floating rounds completed

* completed

* correction on github workflow

* tiles folder no longer needed

* addressed comments

* updating requirements.txt

* poetry lock update

* adding xlswriter

* final poetrylock

* updated requirements.txt

* checkpoint

* removed matplotlib

* ignoring pylint too many statements

* reinstated too many statements

* converting data sync to generate score GHA UI-driven
2021-08-10 15:28:50 -04:00