Commit graph

76 commits

Author SHA1 Message Date
Jorge Escobar
0a21fc6b12
Add territory boundary data (#885)
* Add territory boundary data

* housing and transp

* lint

* lint

* lint
2021-11-16 10:05:09 -05:00
Lucas Merrill Brown
e8d64df510
Fixing missing FEMA fields (#892) 2021-11-15 11:06:44 -05:00
Lucas Merrill Brown
21834b4a91
Issue 883: Update FEMA risk index measure (#884)
* ETL updated

* Adding three fields to score
2021-11-13 11:32:15 -05:00
Lucas Merrill Brown
05ebf9b48c
Add median house value to Definition L (#882)
* Added house value to ETL

* Adding house value to score formula and comp tool
2021-11-13 10:29:23 -05:00
Vincent La
b0dbc90064
[ISS-723] Load Census Data for 4 Territories (#816)
* Adding census decennial data for island territories
2021-11-09 16:32:46 -05:00
Jorge Escobar
053dde0d40
Display score L on map (#849)
* updates to first docker run

* tile constants

* frontend changes

* updating pickles instructions

* pickles
2021-11-05 16:26:14 -04:00
Lucas Merrill Brown
03e59f2abd
Definition L updates (#862)
* Changing FEMA risk measure 

* Adding "basic stats" feature to comparison tool 

* Tweaking Definition L
2021-11-05 15:43:52 -04:00
Shelby Switzer
a0bf186ee6
Add percentile column for L (#851)
* Add percentile column for L

* Use Definition instead of Score

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-11-04 13:03:56 -04:00
Lucas Merrill Brown
8372b47d42
Various updates to Definition L (#850)
* removing percentiles as separate field names

* adding RMP
2021-11-04 12:17:45 -04:00
Lucas Merrill Brown
1d541be447
Add EJSCREEN Areas of Concern (#843)
* Adding ej screen areas of concern

* Uses it where user has local files, but not otherwise

Co-authored-by: VincentLaUSDS <vincent.la@omb.eop.gov>
2021-11-02 15:38:42 -04:00
Shelby Switzer
7bd1a9e59e
Big ole score refactor (#815)
* WIP

* Create ScoreCalculator

This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or  similar
classes.

* Add factor logic for score L

Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.

Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator

* Update thresholds and get score L calc working

* Update header name for consistency and update comparison tool

* Initial move of score to score calculator

* WIP big refactor

* Continued WIP on score refactor

* WIP score refactor

* Get to a working score-run

* Refactor to pass df to score init

This makes it easier to pass df around within a class with multiple
methods that require df.

* Updates from Black

* Updates from linting

* Use named imports instead of wildcard; log more

* Additional refactors

* move more field names to field_names constants file
* import constants without a relative path (would break docker)
* run linting
* raise error if add_columns is not implemented in a child class

* Refactor dict to namedtuple in score c

* Update L to use all percentile field

* change high school ed field in L back

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-11-02 14:12:53 -04:00
Jorge Escobar
1b17af84c8
Combine + Tilefy (#806)
* init

* score-post

* added score csv s3 download; remore poetry cmds from readme

* working census tile fetch

* PR review

* Github Actions Work
2021-11-01 18:05:05 -04:00
Shelby Switzer
7b87e0ec99
Add Score L (#812)
* Create ScoreCalculator

This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or  similar
classes.

* Add factor logic for score L

Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.

Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-10-28 16:07:41 -04:00
Jorge Escobar
a94b8e2761 final census GHA 2021-10-14 13:50:56 -04:00
Jorge Escobar
8ddfc6b305
Update application.py 2021-10-14 13:31:37 -04:00
Jorge Escobar
3b04356fb3
Data sources from S3 (#769)
* Started 535

* Data sources from S3

* lint

* renove breakpoints

* PR comments

* lint

* census data completed

* lint

* renaming data source
2021-10-13 16:00:33 -04:00
Billy Daly
d1273b63c5
Add ETL Contract Checks (#619)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline

* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for 
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly

* Adds test for ExtractTransformLoad.__init__() and base.py

* Fixes failing flake8 test

* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml

* Adds test for validate_output()

* Adds remaining tests

* Removes is_dataset from init method

* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a 
non-census dataset and removes the need to overwrite the value in 
mock_etl fixture

* Re-formats files with black and fixes broken tox tests
2021-10-13 15:54:15 -04:00
Shelby Switzer
d8c73e6a02
Change downloadable file names (#708)
* Change downloadable file names

* Remove constants because we're dynamically creating these
* Update to "communities" for the descriptor word based on team convo
* Add timestamp in 2020-09-20-0930 format because I personally think
this is the best ^.^

* Add a CLI command to run ETL Score Post so that we don't have to
  run the score generation just to get new downloadable files.
* Also make sure the old downloadable files are cleaned up on the
  run of this command.

* Remove unused library, thanks pylint!

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-10-01 15:04:37 -04:00
Jorge Escobar
2f8f2240b4
added new PDF file (#745) 2021-09-23 13:34:50 -04:00
Lucas Merrill Brown
b1a4d26be8
Adding persistent poverty tracts (#738)
* persistent poverty working

* fixing left-padding

* running black and adding persistent poverty to comp tool

* fixing bug

* running black and fixing linter

* fixing linter

* fixing linter error
2021-09-22 17:57:08 -04:00
Shelby Switzer
d3a18352fc
Add pytest to tox run in CI/CD (#713)
* Add pytest to tox run in CI/CD

* Try fixing tox dependencies for pytest

* update poetry to get ci/cd passing

* Run poetry export with --dev flag to include dev dependencies such as pytest

* WIP updating test fixtures to include PDF

* Remove dev dependencies from reqs and add pytest to envlist to make build faster

* passing score_post tests

* Add pytest tox (#729)

* Fix failing pytest

* Fixes failing tox tests and updates requirements.txt to include dev deps

* pickle protocol 4

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
Co-authored-by: Billy Daly <williamdaly422@gmail.com>
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
2021-09-22 13:47:37 -04:00
Vincent La
7709836a12
Ticket 355: Adding map to Urban vs Rural Census Tracts (#696)
* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* adding urban to comparison tool

* renaming file

* adding urban rural to more comp tool outputs

* updating requirements and poetry

* Adding ej screen notebook

* removing ej screen notebook since it's in justice40-tool-iss-719

Co-authored-by: La <ryy0@cdc.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-09-22 12:31:03 -04:00
Jorge Escobar
cd33f323c8
Revised Columns on Download File + PDF (#701)
* Revised Columns on Download File + PDF

* finishing ticket
2021-09-17 13:11:23 -04:00
Lucas Merrill Brown
a1a988da46
Minor updates to scoring comparison tool (#686)
* Formatting updates for output XLSX
2021-09-16 14:06:33 -05:00
Jorge Escobar
487f6a8e04
Score Indicators (#690)
* Score Indicators

* roudning issue with housing burden column

* switching out score g

* final list of columns

* removing duplicate housing burden percentile fields

* removing duplicate

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-09-16 10:53:05 -04:00
Lucas Merrill Brown
1c0d87d84b
Add FEMA risk index to score file (#687)
* Add to score file
2021-09-15 13:31:32 -05:00
Lucas Merrill Brown
e94d05882c
Issue 675 & 676: Adding life expectancy and DOE energy burden data (#683)
* Adding two new data sources.
2021-09-15 09:59:28 -05:00
Jorge Escobar
fc5ed37fca
dependabot bump pillow (#681)
* dependabot bump pillow

* updated poetry

* adding encoding to file open
2021-09-14 17:28:59 -04:00
Lucas Merrill Brown
52e70653f0
Prototype H (#682) 2021-09-14 16:16:41 -05:00
Jorge Escobar
5bd63c083b
Run all Census, ETL, Score, Combine and Tilefy in one command (#662)
* Run all Census, ETL, Score, Combine and Tilefy in one command

* docker cmd

* some docker improvements

* feedback updates

* lint
2021-09-14 14:15:34 -04:00
Lucas Merrill Brown
1083e953da
Prototype G (#672)
* wip

* cleanup

* cleanup 2

* fixing import ordering linter error

* updating backend to use score G

* adding percentile to score output

* update tippeanoe compression

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-09-14 10:48:11 -04:00
Jorge Escobar
879cb7d022
hotfix wrong score tile csv path (#671)
* hotfix wrong score tile csv path

* updating test

* forcing update

* triggering action
2021-09-14 07:27:48 -04:00
Lucas Merrill Brown
7d13be7651
Ticket 492: Integrate Area Median Income and Poverty measures into ETL (#660)
* Loading AMI and poverty data
2021-09-13 15:36:35 -05:00
Shelby Switzer
d7274888b6
Update downloadable zip file (#659)
* Update downloadable zip file

* Don't use spaces in the name, as per #620
* Add the score D columns, as per #596

* fix paths and directories in etl_score_post

while the tests seemed to  be passing, I encountered an error when
running poetry run score, which was caused by us creating a directory
called <name>.csv, instead of creating the parent directory.

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 16:06:47 -04:00
Nat Hillard
536a35d6a0
Data Unit Tests (#509)
* Fixes #341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit

* Location bug

* Adding Load tests

* Fixing XLSX filename

* Adding downloadable zip test

* updating pickle

* Fixing pylint warnings

* Updte readme to correct some typos and reorganize test content structure

* Removing unused schemas file, adding details to readme around pickles, per PR feedback

* Update test to pass with Score D added to score file; update path in readme

* fix requirements.txt after merge

* fix poetry.lock after merge

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 14:17:34 -04:00
Shelby Switzer
ac62933d16
Initial refactor for Score ETL (#618)
* WIP refactor

* Exract score calculations into their own methods

* do all initial df prep in single method

* Fix error in docs for running etl for single dataset

* WIP understanding HUD and linguistic iso data

* Add comments from initial group review on PR

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 10:34:34 -04:00
Jorge Escobar
470c474367
Updated README (#652)
* Updated README

- Added a link to the full score data set on S3
- Some Docker updates

* typo
2021-09-10 10:15:46 -04:00
Jorge Escobar
327e27e713
Add Score D to USA Low (#629)
* added score D

* Adding Score D to usa-low

* rounding score d

* small vscode update

* last couple of vscode changes

* uncommited bscode changes
2021-09-08 16:44:26 -04:00
Jorge Escobar
1953d2fcd8
Additional VSCode and Poetry tasks added (#624)
* additional tasks added

* Update launch.json
2021-09-08 14:54:38 -04:00
dependabot[bot]
f4ffcc6a53
Bump pillow from 8.3.1 to 8.3.2 in /data/data-pipeline (#625)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.3.1 to 8.3.2.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/8.3.1...8.3.2)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-08 13:08:58 -04:00
Billy Daly
f0900f7b69
Adds National Risk Index data to ETL pipeline (#549)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline
2021-09-07 20:51:34 -04:00
Jorge Escobar
94298635c2
Add to decimal rounding (#623)
* added score D

* forgot to add decimal rounding
2021-09-07 14:30:45 -04:00
Jorge Escobar
99503a2541
added score D (#621) 2021-09-07 13:37:16 -04:00
Jorge Escobar
f5ba63977a
Hotfix for Readme and ACS File name (#563) 2021-08-24 17:01:12 -04:00
Lucas Merrill Brown
65ceb7900f
Score F, testing methodology (#510)
* fixing dependency issue

* fixing more dependencies

* including fraction of state AMI

* wip

* nitpick whitespace

* etl working now

* wip on scoring

* fix rename error

* reducing metrics

* fixing score f

* fixing readme

* adding dependency

* passing tests;

* linting/black

* removing unnecessary sample

* fixing error

* adding verify flag on etl/base

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-08-24 16:40:54 -04:00
Jorge Escobar
c24e13c930
Update GHA to push only client changes to S3 (#543) 2021-08-16 17:00:43 -04:00
Shelby Switzer
2c79396550
Initial draft for data provenance addition to README (#528)
* Initial draft for data provenance

We want to make the data usable/available at every step of our data
pipeline. This starts te addition to the README that spells out the data
provenance and where each version of the data as it goes through our
pipeline lives.

* Update README with placeholders for next steps in data provenance

* Add coming soon placeholders for remaining data locations

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-08-16 16:45:54 -04:00
Jorge Escobar
c19cd3ee55
hotfix on float cols (#526) 2021-08-13 15:48:31 -04:00
Vim
1dbb1018d6
sets column as percentiles (#525)
* sets column as percentiles

* adds trailing comma
2021-08-13 12:01:34 -07:00
Jorge Escobar
773c035493
AWS Sync Public Read (#508)
* adding layer to mvts

* small fix for GHA

* AWS Sync Public Read

* removed temp file

* updated state media income ftp
2021-08-12 14:17:25 -04:00