Commit graph

83 commits

Author SHA1 Message Date
Jorge Escobar
fc5ed37fca
dependabot bump pillow (#681)
* dependabot bump pillow

* updated poetry

* adding encoding to file open
2021-09-14 17:28:59 -04:00
Lucas Merrill Brown
52e70653f0
Prototype H (#682) 2021-09-14 16:16:41 -05:00
Jorge Escobar
5bd63c083b
Run all Census, ETL, Score, Combine and Tilefy in one command (#662)
* Run all Census, ETL, Score, Combine and Tilefy in one command

* docker cmd

* some docker improvements

* feedback updates

* lint
2021-09-14 14:15:34 -04:00
Lucas Merrill Brown
1083e953da
Prototype G (#672)
* wip

* cleanup

* cleanup 2

* fixing import ordering linter error

* updating backend to use score G

* adding percentile to score output

* update tippeanoe compression

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-09-14 10:48:11 -04:00
Jorge Escobar
879cb7d022
hotfix wrong score tile csv path (#671)
* hotfix wrong score tile csv path

* updating test

* forcing update

* triggering action
2021-09-14 07:27:48 -04:00
Lucas Merrill Brown
7d13be7651
Ticket 492: Integrate Area Median Income and Poverty measures into ETL (#660)
* Loading AMI and poverty data
2021-09-13 15:36:35 -05:00
Shelby Switzer
d7274888b6
Update downloadable zip file (#659)
* Update downloadable zip file

* Don't use spaces in the name, as per #620
* Add the score D columns, as per #596

* fix paths and directories in etl_score_post

while the tests seemed to  be passing, I encountered an error when
running poetry run score, which was caused by us creating a directory
called <name>.csv, instead of creating the parent directory.

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 16:06:47 -04:00
Nat Hillard
536a35d6a0
Data Unit Tests (#509)
* Fixes #341 -
As a J40 developer, I want to write Unit Tests for the ETL files,
so that tests are run on each commit

* Location bug

* Adding Load tests

* Fixing XLSX filename

* Adding downloadable zip test

* updating pickle

* Fixing pylint warnings

* Updte readme to correct some typos and reorganize test content structure

* Removing unused schemas file, adding details to readme around pickles, per PR feedback

* Update test to pass with Score D added to score file; update path in readme

* fix requirements.txt after merge

* fix poetry.lock after merge

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 14:17:34 -04:00
Shelby Switzer
ac62933d16
Initial refactor for Score ETL (#618)
* WIP refactor

* Exract score calculations into their own methods

* do all initial df prep in single method

* Fix error in docs for running etl for single dataset

* WIP understanding HUD and linguistic iso data

* Add comments from initial group review on PR

Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>
2021-09-10 10:34:34 -04:00
Jorge Escobar
327e27e713
Add Score D to USA Low (#629)
* added score D

* Adding Score D to usa-low

* rounding score d

* small vscode update

* last couple of vscode changes

* uncommited bscode changes
2021-09-08 16:44:26 -04:00
Billy Daly
f0900f7b69
Adds National Risk Index data to ETL pipeline (#549)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline
2021-09-07 20:51:34 -04:00
Jorge Escobar
94298635c2
Add to decimal rounding (#623)
* added score D

* forgot to add decimal rounding
2021-09-07 14:30:45 -04:00
Jorge Escobar
99503a2541
added score D (#621) 2021-09-07 13:37:16 -04:00
Lucas Merrill Brown
65ceb7900f
Score F, testing methodology (#510)
* fixing dependency issue

* fixing more dependencies

* including fraction of state AMI

* wip

* nitpick whitespace

* etl working now

* wip on scoring

* fix rename error

* reducing metrics

* fixing score f

* fixing readme

* adding dependency

* passing tests;

* linting/black

* removing unnecessary sample

* fixing error

* adding verify flag on etl/base

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-08-24 16:40:54 -04:00
Jorge Escobar
c24e13c930
Update GHA to push only client changes to S3 (#543) 2021-08-16 17:00:43 -04:00
Jorge Escobar
c19cd3ee55
hotfix on float cols (#526) 2021-08-13 15:48:31 -04:00
Vim
1dbb1018d6
sets column as percentiles (#525)
* sets column as percentiles

* adds trailing comma
2021-08-13 12:01:34 -07:00
Jorge Escobar
773c035493
AWS Sync Public Read (#508)
* adding layer to mvts

* small fix for GHA

* AWS Sync Public Read

* removed temp file

* updated state media income ftp
2021-08-12 14:17:25 -04:00
Jorge Escobar
d259d97ba9
adding layer to mvts (#503)
* adding layer to mvts

* small fix for GHA
2021-08-12 10:56:54 -04:00
Jorge Escobar
6dc1283ee2 added comment 2021-08-10 15:37:36 -04:00
Jorge Escobar
3d8dbb293c
Tile-baking columns with floating rounds completed (#491)
* Tile-baking columns with floating rounds completed

* completed

* correction on github workflow

* tiles folder no longer needed

* addressed comments

* updating requirements.txt

* poetry lock update

* adding xlswriter

* final poetrylock

* updated requirements.txt

* checkpoint

* removed matplotlib

* ignoring pylint too many statements

* reinstated too many statements

* converting data sync to generate score GHA UI-driven
2021-08-10 15:28:50 -04:00
lucasmbrown-usds
ebe6180f7c wip 2021-08-09 22:24:14 -05:00
lucasmbrown-usds
cf13036d20 clearing output 2021-08-09 21:31:07 -05:00
lucasmbrown-usds
ce5e8c5351 including fraction of state AMI 2021-08-09 21:30:41 -05:00
lucasmbrown-usds
4ae7eff4c4 adding median income field and running black 2021-08-09 20:47:51 -05:00
Nat Hillard
9a9d5fdf7f
Backend change for Zipfile pt. 2 (#469)
* Fixes #303 : adding downloadable zip archive logic
* linter recommendations
* Pushes data directory to AWS. We'll want to move to use AWS for this ASAP, but this works for now
* updating pattern
2021-08-09 10:39:59 -04:00
Nat Hillard
ec19d86f6f
Adding back census to list of potential datasets, but separating out from standard list (#484)
Error this addresses:
  File "/Users/lucas/Documents/usds/repos/justice40-tool/data/data-pipeline/data_pipeline/etl/runner.py", line 71, in etl_runner
    f"data_pipeline.etl.sources.{dataset['module_dir']}.etl"
TypeError: 'NoneType' object is not subscriptable
2021-08-09 09:52:06 -04:00
Jorge Escobar
f51b0d69d9
Poetry updates for application (#483) 2021-08-06 16:24:30 -04:00
Nat Hillard
6fb36ded9c
adding additional missed import (#477) 2021-08-06 11:48:11 -04:00
Nat Hillard
9d962eb5d9
Moving from relative imports to absolute to enable poetry run python data-pipeline/application.py [command] (#476) 2021-08-06 11:41:28 -04:00
Nat Hillard
45a8b1c026
Census ETL should use standard ETL form (#474)
* Fixes #473
Census ETL should use standard ETL form

* linter fixes
2021-08-06 11:01:51 -04:00
Nat Hillard
9f3b2f056b
Fixes #467: (#470)
If the census download task is run more than once,
us.csv doubles in size and all data is removed from dataframe
2021-08-05 16:20:18 -04:00
Nat Hillard
c1568e87c0
Data directory should adopt standard Poetry-suggested python package structure (#457)
* Fixes #456 - Our data directory should adopt standard python package structure
* a few missed references
* updating readme
* updating requirements
* Running Black
* Fixes for flake8
* updating pylint
2021-08-05 15:35:54 -04:00