Commit graph

188 commits

Author SHA1 Message Date
Shelby Switzer
617f41526f Update Census AMI to ETL into tracts, not CBGs (#900)
* Update Census AMI to ETL into tracts, not CBGs

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-11-30 13:49:20 -05:00
Lucas Merrill Brown
537844236a Update FEMA data to be tracts, not block groups (#906) 2021-11-30 13:49:20 -05:00
Shelby Switzer
893758f1d4 Use tract instead of block group when calling census API (#901)
* Use tract instead of block group when calling census API

* fixing merge conflicts

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-11-30 13:49:20 -05:00
Shelby Switzer
0c8b32e679 Move Housing and Transportation Index to tracts (#903)
Update data download URL to use tract as focus, use tract field name,
and move this dataset to the tracts df list in etl_score.

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-11-30 13:49:20 -05:00
Lucas Merrill Brown
776a52595f Switching island territories data to tracts (#879) 2021-11-30 13:49:20 -05:00
Saran Ahluwalia
b0c176daee
Remove inplace argument to prevent SettingWithCopyError (#899)
* removed inplace argument to prevent copies of dataframe to be set and chained assignment to propogate and raise exception

* removed inplace argument to prevent copies of dataframe to be set and chained assignment to propogate and raise exception

* remove superfluous pandas options that affects flake results

* remove (again) the same chained assignment from previous merge

Co-authored-by: Saran Ahluwalia <sarahluw@cisco.com>
2021-11-29 13:27:23 -05:00
Saran Ahluwalia
ec8f3543e5
Remove Index related to FEMA (#917)
Co-authored-by: Saran Ahluwalia <sarahluw@cisco.com>
2021-11-24 16:50:09 -05:00
Lucas Merrill Brown
474d010bf4
Quick fix on island territories directory name (#877) 2021-11-16 14:31:11 -05:00
Jorge Escobar
0a21fc6b12
Add territory boundary data (#885)
* Add territory boundary data

* housing and transp

* lint

* lint

* lint
2021-11-16 10:05:09 -05:00
Lucas Merrill Brown
e8d64df510
Fixing missing FEMA fields (#892) 2021-11-15 11:06:44 -05:00
Lucas Merrill Brown
21834b4a91
Issue 883: Update FEMA risk index measure (#884)
* ETL updated

* Adding three fields to score
2021-11-13 11:32:15 -05:00
Lucas Merrill Brown
05ebf9b48c
Add median house value to Definition L (#882)
* Added house value to ETL

* Adding house value to score formula and comp tool
2021-11-13 10:29:23 -05:00
Vincent La
b0dbc90064
[ISS-723] Load Census Data for 4 Territories (#816)
* Adding census decennial data for island territories
2021-11-09 16:32:46 -05:00
Jorge Escobar
053dde0d40
Display score L on map (#849)
* updates to first docker run

* tile constants

* frontend changes

* updating pickles instructions

* pickles
2021-11-05 16:26:14 -04:00
Lucas Merrill Brown
03e59f2abd
Definition L updates (#862)
* Changing FEMA risk measure 

* Adding "basic stats" feature to comparison tool 

* Tweaking Definition L
2021-11-05 15:43:52 -04:00
Shelby Switzer
a0bf186ee6
Add percentile column for L (#851)
* Add percentile column for L

* Use Definition instead of Score

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-11-04 13:03:56 -04:00
Lucas Merrill Brown
8372b47d42
Various updates to Definition L (#850)
* removing percentiles as separate field names

* adding RMP
2021-11-04 12:17:45 -04:00
Lucas Merrill Brown
1d541be447
Add EJSCREEN Areas of Concern (#843)
* Adding ej screen areas of concern

* Uses it where user has local files, but not otherwise

Co-authored-by: VincentLaUSDS <vincent.la@omb.eop.gov>
2021-11-02 15:38:42 -04:00
Shelby Switzer
7bd1a9e59e
Big ole score refactor (#815)
* WIP

* Create ScoreCalculator

This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or  similar
classes.

* Add factor logic for score L

Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.

Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator

* Update thresholds and get score L calc working

* Update header name for consistency and update comparison tool

* Initial move of score to score calculator

* WIP big refactor

* Continued WIP on score refactor

* WIP score refactor

* Get to a working score-run

* Refactor to pass df to score init

This makes it easier to pass df around within a class with multiple
methods that require df.

* Updates from Black

* Updates from linting

* Use named imports instead of wildcard; log more

* Additional refactors

* move more field names to field_names constants file
* import constants without a relative path (would break docker)
* run linting
* raise error if add_columns is not implemented in a child class

* Refactor dict to namedtuple in score c

* Update L to use all percentile field

* change high school ed field in L back

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-11-02 14:12:53 -04:00
Jorge Escobar
1b17af84c8
Combine + Tilefy (#806)
* init

* score-post

* added score csv s3 download; remore poetry cmds from readme

* working census tile fetch

* PR review

* Github Actions Work
2021-11-01 18:05:05 -04:00
Shelby Switzer
7b87e0ec99
Add Score L (#812)
* Create ScoreCalculator

This calculates all the factors for score L for now (with placeholder
formulae because this is a WIP). I think ideallly we'll want to
refactor all the score code to be extracted into this or  similar
classes.

* Add factor logic for score L

Updated factor logic to match score L factors methodology.
Still need to get the Score L field itself working.

Cleanup needed: Pull field names into constants file, extract all score
calculation into score calculator

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-10-28 16:07:41 -04:00
Jorge Escobar
a94b8e2761 final census GHA 2021-10-14 13:50:56 -04:00
Jorge Escobar
8ddfc6b305
Update application.py 2021-10-14 13:31:37 -04:00
Jorge Escobar
3b04356fb3
Data sources from S3 (#769)
* Started 535

* Data sources from S3

* lint

* renove breakpoints

* PR comments

* lint

* census data completed

* lint

* renaming data source
2021-10-13 16:00:33 -04:00
Billy Daly
d1273b63c5
Add ETL Contract Checks (#619)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline

* Adds new methods to ExtractTransformLoad base class:
- __init__() Initializes class attributes
- _get_census_fips_codes() Loads a dataframe with the fips codes for 
census block group and tract
- validate_init() Checks that the class was initialized correctly
- validate_output() Checks that the output was loaded correctly

* Adds test for ExtractTransformLoad.__init__() and base.py

* Fixes failing flake8 test

* Changes geo_col to geoid_col and changes is_dataset to is_census in yaml

* Adds test for validate_output()

* Adds remaining tests

* Removes is_dataset from init method

* Makes CENSUS_CSV a class attribute instead of a class global:
This ensures that CENSUS_CSV is only set when the ETL class is for a 
non-census dataset and removes the need to overwrite the value in 
mock_etl fixture

* Re-formats files with black and fixes broken tox tests
2021-10-13 15:54:15 -04:00
Shelby Switzer
d8c73e6a02
Change downloadable file names (#708)
* Change downloadable file names

* Remove constants because we're dynamically creating these
* Update to "communities" for the descriptor word based on team convo
* Add timestamp in 2020-09-20-0930 format because I personally think
this is the best ^.^

* Add a CLI command to run ETL Score Post so that we don't have to
  run the score generation just to get new downloadable files.
* Also make sure the old downloadable files are cleaned up on the
  run of this command.

* Remove unused library, thanks pylint!

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-10-01 15:04:37 -04:00
Jorge Escobar
2f8f2240b4
added new PDF file (#745) 2021-09-23 13:34:50 -04:00
Lucas Merrill Brown
b1a4d26be8
Adding persistent poverty tracts (#738)
* persistent poverty working

* fixing left-padding

* running black and adding persistent poverty to comp tool

* fixing bug

* running black and fixing linter

* fixing linter

* fixing linter error
2021-09-22 17:57:08 -04:00
Shelby Switzer
d3a18352fc
Add pytest to tox run in CI/CD (#713)
* Add pytest to tox run in CI/CD

* Try fixing tox dependencies for pytest

* update poetry to get ci/cd passing

* Run poetry export with --dev flag to include dev dependencies such as pytest

* WIP updating test fixtures to include PDF

* Remove dev dependencies from reqs and add pytest to envlist to make build faster

* passing score_post tests

* Add pytest tox (#729)

* Fix failing pytest

* Fixes failing tox tests and updates requirements.txt to include dev deps

* pickle protocol 4

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
Co-authored-by: Billy Daly <williamdaly422@gmail.com>
Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>
2021-09-22 13:47:37 -04:00
Vincent La
7709836a12
Ticket 355: Adding map to Urban vs Rural Census Tracts (#696)
* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* adding urban to comparison tool

* renaming file

* adding urban rural to more comp tool outputs

* updating requirements and poetry

* Adding ej screen notebook

* removing ej screen notebook since it's in justice40-tool-iss-719

Co-authored-by: La <ryy0@cdc.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-09-22 12:31:03 -04:00
Jorge Escobar
cd33f323c8
Revised Columns on Download File + PDF (#701)
* Revised Columns on Download File + PDF

* finishing ticket
2021-09-17 13:11:23 -04:00
Lucas Merrill Brown
a1a988da46
Minor updates to scoring comparison tool (#686)
* Formatting updates for output XLSX
2021-09-16 14:06:33 -05:00
Jorge Escobar
487f6a8e04
Score Indicators (#690)
* Score Indicators

* roudning issue with housing burden column

* switching out score g

* final list of columns

* removing duplicate housing burden percentile fields

* removing duplicate

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-09-16 10:53:05 -04:00
Lucas Merrill Brown
1c0d87d84b
Add FEMA risk index to score file (#687)
* Add to score file
2021-09-15 13:31:32 -05:00
Lucas Merrill Brown
e94d05882c
Issue 675 & 676: Adding life expectancy and DOE energy burden data (#683)
* Adding two new data sources.
2021-09-15 09:59:28 -05:00
Jorge Escobar
fc5ed37fca
dependabot bump pillow (#681)
* dependabot bump pillow

* updated poetry

* adding encoding to file open
2021-09-14 17:28:59 -04:00
Lucas Merrill Brown
52e70653f0
Prototype H (#682) 2021-09-14 16:16:41 -05:00
Jorge Escobar
5bd63c083b
Run all Census, ETL, Score, Combine and Tilefy in one command (#662)
* Run all Census, ETL, Score, Combine and Tilefy in one command

* docker cmd

* some docker improvements

* feedback updates

* lint
2021-09-14 14:15:34 -04:00
Lucas Merrill Brown
1083e953da
Prototype G (#672)
* wip

* cleanup

* cleanup 2

* fixing import ordering linter error

* updating backend to use score G

* adding percentile to score output

* update tippeanoe compression

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-09-14 10:48:11 -04:00
Jorge Escobar
879cb7d022
hotfix wrong score tile csv path (#671)
* hotfix wrong score tile csv path

* updating test

* forcing update

* triggering action
2021-09-14 07:27:48 -04:00
Lucas Merrill Brown
7d13be7651
Ticket 492: Integrate Area Median Income and Poverty measures into ETL (#660)
* Loading AMI and poverty data
2021-09-13 15:36:35 -05:00
Shelby Switzer
d7274888b6
Update downloadable zip file (#659)
* Update downloadable zip file

* Don't use spaces in the name, as per #620
* Add the score D columns, as per #596

* fix paths and directories in etl_score_post

while the tests seemed to  be passing, I encountered an error when
running poetry run score, which was caused by us creating a directory
called <name>.csv, instead of creating the parent directory.

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 16:06:47 -04:00
Nat Hillard
536a35d6a0
Data Unit Tests (#509)
* Fixes #341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit

* Location bug

* Adding Load tests

* Fixing XLSX filename

* Adding downloadable zip test

* updating pickle

* Fixing pylint warnings

* Updte readme to correct some typos and reorganize test content structure

* Removing unused schemas file, adding details to readme around pickles, per PR feedback

* Update test to pass with Score D added to score file; update path in readme

* fix requirements.txt after merge

* fix poetry.lock after merge

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 14:17:34 -04:00
Shelby Switzer
ac62933d16
Initial refactor for Score ETL (#618)
* WIP refactor

* Exract score calculations into their own methods

* do all initial df prep in single method

* Fix error in docs for running etl for single dataset

* WIP understanding HUD and linguistic iso data

* Add comments from initial group review on PR

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 10:34:34 -04:00
Jorge Escobar
470c474367
Updated README (#652)
* Updated README

- Added a link to the full score data set on S3
- Some Docker updates

* typo
2021-09-10 10:15:46 -04:00
Jorge Escobar
327e27e713
Add Score D to USA Low (#629)
* added score D

* Adding Score D to usa-low

* rounding score d

* small vscode update

* last couple of vscode changes

* uncommited bscode changes
2021-09-08 16:44:26 -04:00
Jorge Escobar
1953d2fcd8
Additional VSCode and Poetry tasks added (#624)
* additional tasks added

* Update launch.json
2021-09-08 14:54:38 -04:00
dependabot[bot]
f4ffcc6a53
Bump pillow from 8.3.1 to 8.3.2 in /data/data-pipeline (#625)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.3.1 to 8.3.2.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/8.3.1...8.3.2)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-09-08 13:08:58 -04:00
Billy Daly
f0900f7b69
Adds National Risk Index data to ETL pipeline (#549)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline
2021-09-07 20:51:34 -04:00
Jorge Escobar
94298635c2
Add to decimal rounding (#623)
* added score D

* forgot to add decimal rounding
2021-09-07 14:30:45 -04:00