j40-cejst-2

mirror of https://github.com/DOI-DO/j40-cejst-2.git synced 2025-02-23 01:54:18 -08:00

Author	SHA1	Message	Date
Nat Hillard	536a35d6a0	Data Unit Tests (#509 ) * Fixes #341 - As a J40 developer, I want to write Unit Tests for the ETL files, so that tests are run on each commit * Location bug * Adding Load tests * Fixing XLSX filename * Adding downloadable zip test * updating pickle * Fixing pylint warnings * Updte readme to correct some typos and reorganize test content structure * Removing unused schemas file, adding details to readme around pickles, per PR feedback * Update test to pass with Score D added to score file; update path in readme * fix requirements.txt after merge * fix poetry.lock after merge Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>	2021-09-10 14:17:34 -04:00
Shelby Switzer	ac62933d16	Initial refactor for Score ETL (#618 ) * WIP refactor * Exract score calculations into their own methods * do all initial df prep in single method * Fix error in docs for running etl for single dataset * WIP understanding HUD and linguistic iso data * Add comments from initial group review on PR Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>	2021-09-10 10:34:34 -04:00
Jorge Escobar	470c474367	Updated README (#652 ) * Updated README - Added a link to the full score data set on S3 - Some Docker updates * typo	2021-09-10 10:15:46 -04:00
Jorge Escobar	327e27e713	Add Score D to USA Low (#629 ) * added score D * Adding Score D to usa-low * rounding score d * small vscode update * last couple of vscode changes * uncommited bscode changes	2021-09-08 16:44:26 -04:00
Jorge Escobar	1953d2fcd8	Additional VSCode and Poetry tasks added (#624 ) * additional tasks added * Update launch.json	2021-09-08 14:54:38 -04:00
dependabot[bot]	f4ffcc6a53	Bump pillow from 8.3.1 to 8.3.2 in /data/data-pipeline (#625 ) Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.3.1 to 8.3.2. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/8.3.1...8.3.2) --- updated-dependencies: - dependency-name: pillow dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-09-08 13:08:58 -04:00
Billy Daly	f0900f7b69	Adds National Risk Index data to ETL pipeline (#549 ) * Adds dev dependencies to requirements.txt and re-runs black on codebase * Adds test and code for national risk index etl, still in progress * Removes test_data from .gitignore * Adds test data to nation_risk_index tests * Creates tests and ETL class for NRI data * Adds tests for load() and transform() methods of NationalRiskIndexETL * Updates README.md with info about the NRI dataset * Adds to dos * Moves tests and test data into a tests/ dir in national_risk_index * Moves tmp_dir for tests into data/tmp/tests/ * Promotes fixtures to conftest and relocates national_risk_index tests: The relocation of national_risk_index tests is necessary because tests can only use fixtures specified in conftests within the same package * Fixes issue with df.equals() in test_transform() * Files reformatted by black * Commit changes to other files after re-running black * Fixes unused import that caused lint checks to fail * Moves tests/ directory to app root for data_pipeline	2021-09-07 20:51:34 -04:00
Jorge Escobar	94298635c2	Add to decimal rounding (#623 ) * added score D * forgot to add decimal rounding	2021-09-07 14:30:45 -04:00
Jorge Escobar	99503a2541	added score D (#621 )	2021-09-07 13:37:16 -04:00
Jorge Escobar	f5ba63977a	Hotfix for Readme and ACS File name (#563 )	2021-08-24 17:01:12 -04:00
Lucas Merrill Brown	65ceb7900f	Score F, testing methodology (#510 ) * fixing dependency issue * fixing more dependencies * including fraction of state AMI * wip * nitpick whitespace * etl working now * wip on scoring * fix rename error * reducing metrics * fixing score f * fixing readme * adding dependency * passing tests; * linting/black * removing unnecessary sample * fixing error * adding verify flag on etl/base Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>	2021-08-24 16:40:54 -04:00
Jorge Escobar	c24e13c930	Update GHA to push only client changes to S3 (#543 )	2021-08-16 17:00:43 -04:00
Shelby Switzer	2c79396550	Initial draft for data provenance addition to README (#528 ) * Initial draft for data provenance We want to make the data usable/available at every step of our data pipeline. This starts te addition to the README that spells out the data provenance and where each version of the data as it goes through our pipeline lives. * Update README with placeholders for next steps in data provenance * Add coming soon placeholders for remaining data locations Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>	2021-08-16 16:45:54 -04:00
Jorge Escobar	c19cd3ee55	hotfix on float cols (#526 )	2021-08-13 15:48:31 -04:00
Vim	1dbb1018d6	sets column as percentiles (#525 ) * sets column as percentiles * adds trailing comma	2021-08-13 12:01:34 -07:00
Jorge Escobar	773c035493	AWS Sync Public Read (#508 ) * adding layer to mvts * small fix for GHA * AWS Sync Public Read * removed temp file * updated state media income ftp	2021-08-12 14:17:25 -04:00
Jorge Escobar	d259d97ba9	adding layer to mvts (#503 ) * adding layer to mvts * small fix for GHA	2021-08-12 10:56:54 -04:00
Jorge Escobar	6dc1283ee2	added comment	2021-08-10 15:37:36 -04:00
Jorge Escobar	3d8dbb293c	Tile-baking columns with floating rounds completed (#491 ) * Tile-baking columns with floating rounds completed * completed * correction on github workflow * tiles folder no longer needed * addressed comments * updating requirements.txt * poetry lock update * adding xlswriter * final poetrylock * updated requirements.txt * checkpoint * removed matplotlib * ignoring pylint too many statements * reinstated too many statements * converting data sync to generate score GHA UI-driven	2021-08-10 15:28:50 -04:00
lucasmbrown-usds	ebe6180f7c	wip	2021-08-09 22:24:14 -05:00
lucasmbrown-usds	cf13036d20	clearing output	2021-08-09 21:31:07 -05:00
lucasmbrown-usds	ce5e8c5351	including fraction of state AMI	2021-08-09 21:30:41 -05:00
lucasmbrown-usds	4ae7eff4c4	adding median income field and running black	2021-08-09 20:47:51 -05:00
Nat Hillard	6c986adfe4	Check in VSCode config for easier local debug (#487 ) * Fixes #466 - Task: Check in VSCode config for easier local debug	2021-08-09 14:55:13 -04:00
Nat Hillard	9a9d5fdf7f	Backend change for Zipfile pt. 2 (#469 ) * Fixes #303 : adding downloadable zip archive logic * linter recommendations * Pushes data directory to AWS. We'll want to move to use AWS for this ASAP, but this works for now * updating pattern	2021-08-09 10:39:59 -04:00
Nat Hillard	ec19d86f6f	Adding back census to list of potential datasets, but separating out from standard list (#484 ) Error this addresses: File "/Users/lucas/Documents/usds/repos/justice40-tool/data/data-pipeline/data_pipeline/etl/runner.py", line 71, in etl_runner f"data_pipeline.etl.sources.{dataset['module_dir']}.etl" TypeError: 'NoneType' object is not subscriptable	2021-08-09 09:52:06 -04:00
Jorge Escobar	f51b0d69d9	Poetry updates for application (#483 )	2021-08-06 16:24:30 -04:00
Nat Hillard	6fb36ded9c	adding additional missed import (#477 )	2021-08-06 11:48:11 -04:00
Nat Hillard	9d962eb5d9	Moving from relative imports to absolute to enable poetry run python data-pipeline/application.py [command] (#476 )	2021-08-06 11:41:28 -04:00
Nat Hillard	45a8b1c026	Census ETL should use standard ETL form (#474 ) * Fixes #473 Census ETL should use standard ETL form * linter fixes	2021-08-06 11:01:51 -04:00
Nat Hillard	9f3b2f056b	Fixes #467 : (#470 ) If the census download task is run more than once, us.csv doubles in size and all data is removed from dataframe	2021-08-05 16:20:18 -04:00
Nat Hillard	c1568e87c0	Data directory should adopt standard Poetry-suggested python package structure (#457 ) * Fixes #456 - Our data directory should adopt standard python package structure * a few missed references * updating readme * updating requirements * Running Black * Fixes for flake8 * updating pylint	2021-08-05 15:35:54 -04:00
Jorge Escobar	4d7465c833	Hotfix for fips zip download location + added full-score-run command (#465 ) * Hotfix for S3 locations of data sources * updated README * lint failures Co-authored-by: Nat Hillard <Nathaniel.K.Hillard@omb.eop.gov>	2021-08-05 12:55:21 -04:00
Jorge Escobar	5cb00ef0ce	Tile Generation Script (#433 )	2021-08-03 18:23:57 -04:00
Billy Daly	5504528fdf	Issue 308 python linting (#443 ) * Adds flake8, pylint, liccheck, flake8 to dependencies for data-pipeline * Sets up and runs black autoformatting * Adds flake8 to tox linting * Fixes flake8 error F541 f string missing placeholders * Fixes flake8 E501 line too long * Fixes flake8 F401 imported but not used * Adds pylint to tox and disables the following pylint errors: - C0114: module docstrings - R0201: method could have been a function - R0903: too few public methods - C0103: name case styling - W0511: fix me - W1203: f-string interpolation in logging * Adds utils.py to tox.ini linting, runs black on utils.py * Fixes import related pylint errors: C0411 and C0412 * Fixes or ignores remaining pylint errors (for discussion later) * Adds safety and liccheck to tox.ini	2021-08-02 12:16:38 -04:00
Billy Daly	55dabb2b57	Issue 379 tox setup (#405 ) * Adds tox as a dev dependency to data/data-pipeline/pyproject.toml: Also updates poetry.lock and requirements.txt * Adds tox.ini to test build of data/data-pipeline * Sets up GitHub actions workflow for data/ directory * Tries to get Data Checks GitHub action to run * Fixes error with GitHub action * Migrates data/data-roadmap from setuptools to poetry * Sets up tox file for data/data-roadmap * Adds github action for data/data-roadmap * Fixes syntax error in data-checks.yml * Second attempt at fixing data-checks.yml * Export poetry requirements to requirements.txt * Revert "Migrates data/data-roadmap from setuptools to poetry" This reverts commit e8367652d43c1c9beee500f792c8f41e1c1fc462. * Removes pyproject.toml and reverts requirements.txt as well	2021-07-29 14:00:20 -04:00
Shelby Switzer	387ee3a382	Update data documentation and some data steps (#407 ) * Minor documentation updates, plus calenvironscreen S3 URL fix * Update score comparison docs and code * Add steps for running the comparison tool * Update HUD recap ETL to ensure GEOID is imported as a string (if it is imported as an interger by default it will strip the beginning "0" from many IDs) * Add note about execution time * Move step from paragraph to list * Update output dir in README for comp tool Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>	2021-07-29 10:28:52 -04:00
Jorge Escobar	b404fdcc43	Generate Geo-aware scores for all zoom levels (#391 ) * generate Geo-aware scores for all zoom levels * usa high progress * testing dissolve * checkpoint * changing type * removing breakpoint * validation notebooks * quick update * score validation * fixes for county merge * code completed	2021-07-28 16:07:28 -04:00
Lucas Merrill Brown	67b39475f7	Analysis by region (#385 ) * Adding regional comparisons * Small ETL fixes	2021-07-26 10:02:25 -05:00
Rohit Musti	81290ce672	adding tree equity score to the data pipeline (#398 ) * adding tree equity score to the downloading pipeline so it can be easily compared as a reference index! * removed redundant dependencies	2021-07-26 08:00:57 -04:00
Nat Hillard	a7cdf1c021	Adding notebook to create score dissolve (#333 )	2021-07-21 16:10:32 -04:00
Jorge Escobar	543d147e61	Data folder restructuring in preparation for 361 (#376 ) * initial checkin * gitignore and docker-compose update * readme update and error on hud * encoding issue * one more small README change * data roadmap re-strcuture * pyproject sort * small update to score output folders * checkpoint * couple of last fixes	2021-07-20 14:55:39 -04:00

1 2 3 4 5

242 commits