j40-cejst-2

mirror of https://github.com/DOI-DO/j40-cejst-2.git synced 2025-02-23 10:04:18 -08:00

Author	SHA1	Message	Date
matt bowen	a54ab3cc0f	Update deps safety says are vulnerable (#1937 )	2022-09-28 10:39:26 -04:00
Matt Bowen	48d961bb5f	Bump just jupyterlab (#1930 )	2022-09-27 15:13:06 -04:00
Emma Nechamkin	9c0e1993f6	Pipeline tile tests (#1864 ) * temp update * updating with fips check * adding check on pfs * updating with pfs test * Update test_tiles_smoketests.py * Fix lint errors (#1848) * Add column names test (#1848) * Mark tests as smoketests (#1848) * Move to other score-related tests (#1848) * Recast Total threshold criteria exceeded to int (#1848) In writing tests to verify the output of the tiles csv matches the final score CSV, I noticed TC/Total threshold criteria exceeded was getting cast from an int64 to a float64 in the process of PostScoreETL. I tracked it down to the line where we merge the score dataframe with constants.DATA_CENSUS_CSV_FILE_PATH --- there where > 100 tracts in the national census CSV that don't exist in the score, so those ended up with a Total threshhold count of np.nan, which is a float, and thereby cast those columns to float. For the moment I just cast it back. * No need for low memeory (#1848) * Add additional tests of tiles.csv (#1848) * Drop pre-2010 rows before computing score (#1848) Note this is probably NOT the optimal place for this change; it might make more sense for each source to filter its own tracts down to the acceptable tract list. However, that would be a pretty invasive change, where this is central and plenty of other things are happening in score transform that could be moved to sources, so for today, here's where the change will live. * Fix typo (#1848) * Switch from filter to inner join (#1848) * Remove no-op lines from tiles (#1848) * Apply feedback from review, linter (#1848) * Check the values oeverything in the frame (#1848) * Refactor checker class (#1848) * Add test for state names (#1848) * cleanup from reviewing my own code (#1848) * Fix lint error (#1858) * Apply Emma's feedback from review (#1848) * Remove refs to national_df (#1848) * Account for new, fake nullable bools in tiles (#1848) To handle a geojson limitation, Emma converted some nullable boolean colunms to float64 in the tiles export with the values {0.0, 1.0, nan}, giving us the same expressiveness. Sadly, this broke my assumption that all columns between the score and tiles csvs would have the same dtypes, so I need to account for these new, fake bools in my test. * Use equals instead of my worse version (#1848) * Missed a spot where we called _create_score_data (#1848) * Update per safety (#1848) Co-authored-by: matt bowen <matthew.r.bowen@omb.eop.gov>	2022-09-01 13:07:14 -04:00
Emma Nechamkin	ebac552d75	Adding DOT composite to travel score (#1820 ) This adds the DOT dataset to the ETL and to the score. Note that currently we take a percentile of an average of percentiles.	2022-08-16 14:44:39 -04:00
Matt Bowen	d5fbb802e8	Add FUDS ETL (#1817 ) * Add spatial join method (#1871) Since we'll need to figure out the tracts for a large number of points in future tickets, add a utility to handle grabbing the tract geometries and adding tract data to a point dataset. * Add FUDS, also jupyter lab (#1871) * Add YAML configs for FUDS (#1871) * Allow input geoid to be optional (#1871) * Add FUDS ETL, tests, test-datae noteobook (#1871) This adds the ETL class for Formerly Used Defense Sites (FUDS). This is different from most other ETLs since these FUDS are not provided by tract, but instead by geographic point, so we need to assign FUDS to tracts and then do calculations from there. * Floats -> Ints, as I intended (#1871) * Floats -> Ints, as I intended (#1871) * Formatting fixes (#1871) * Add test false positive GEOIDs (#1871) * Add gdal binaries (#1871) * Refactor pandas code to be more idiomatic (#1871) Per Emma, the more pandas-y way of doing my counts is using np.where to add the values i need, then groupby and size. It is definitely more compact, and also I think more correct! * Update configs per Emma suggestions (#1871) * Type fixed! (#1871) * Remove spurious import from vscode (#1871) * Snapshot update after changing col name (#1871) * Move up GDAL (#1871) * Adjust geojson strategy (#1871) * Try running census separately first (#1871) * Fix import order (#1871) * Cleanup cache strategy (#1871) * Download census data from S3 instead of re-calculating (#1871) * Clarify pandas code per Emma (#1871)	2022-08-16 13:28:39 -04:00
Emma Nechamkin	481a2a05f7	updated to fix linting errors (#1818 ) Cleans and updates base branch	2022-08-11 16:34:56 -04:00
Emma Nechamkin	f047ca9d83	Imputing income using geographic neighbors (#1559 ) Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.	2022-08-11 12:33:45 -04:00
dependabot[bot]	2992f8df0b	Bump notebook from 6.4.10 to 6.4.12 in /data/data-pipeline (#1685 ) Bumps [notebook](http://jupyter.org) from 6.4.10 to 6.4.12. --- updated-dependencies: - dependency-name: notebook dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-07-07 17:10:03 -04:00
dependabot[bot]	0555d896fd	Bump lxml from 4.8.0 to 4.9.1 in /data/data-pipeline (#1719 ) Bumps [lxml](https://github.com/lxml/lxml) from 4.8.0 to 4.9.1. - [Release notes](https://github.com/lxml/lxml/releases) - [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt) - [Commits](https://github.com/lxml/lxml/compare/lxml-4.8.0...lxml-4.9.1) --- updated-dependencies: - dependency-name: lxml dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-07-07 17:09:49 -04:00
Emma Nechamkin	2ce4cfe80e	updated with codebook (#1573 )	2022-04-18 18:12:18 -04:00
Jorge Escobar	859177a877	Marshmallow Schemas for YAML files (#1497 ) * Marshmallow Schemas for YAML files * completed ticket * passing tests * lint * click dep * staging BE map * Pr review	2022-03-31 13:56:10 -04:00
Emma Nechamkin	cb963cff5f	Updating comparison tool to be easier for pairwise comparisons (#1400 ) Creating pairwise comparison tool to compare two lists of prioritized tracts to each other.	2022-03-30 14:02:06 -04:00
Emma Nechamkin	0c07cdac55	Adding category count to BE signals (#1486 ) Added category count to downloadable data and backend signals.	2022-03-29 17:11:57 -04:00
Jorge Escobar	7b05ee9c76	S3 Parallel Upload and Deletions (#1410 ) * installation step * trigger action * installing to home dir * dry-run * pyenv * py 2.8 * trying s4cmd * removing pyenv * poetry s4cmd * num-threads * public read * poetry cache * s4cmd all around * poetry cache * poetry cache * install poetry packages * poetry echo * let's do this * s4cmd install on run * s4cmd * ad aws back * add aws back * testing census api key and poetry caching * census api key * census api * census api key #3 * 250 * poetry update * poetry change * check census api key * force flag * update score gen and tilefy; remove cached fips * small gdal update * invalidation * missing cache ids	2022-03-17 23:19:23 -04:00
Emma Nechamkin	9d920d4db4	Updating testing to include pytest-snapshot (#1355 ) In this commit, we slightly change the testing to use `pytest-snapshot`. This is for `ETL`s only.	2022-03-11 21:34:07 -05:00
Emma Nechamkin	917b84dc2e	WY tracts are not showing up until zoom >7 (#1342 ) In order to solve an issue where states with few census tracts appear to have no DACs, we change the low-zoom for states with under some threshold of tracts to be the high-zoom for those states. Thus, WY now has DACs even in low zoom. Yay!	2022-03-08 17:33:11 -05:00
Emma Nechamkin	aea49cbb5a	Cleaning up quick code (#1349 ) Did some quick, mostly cosmetic changes and updates to the quick launch changes. This mostly entailed changing strings to constants and cleaning up some code to make it neater. Changes -- PR AMI, updating ag loss, and dropping pr from some threshold counts.	2022-03-02 16:50:04 -05:00
Jorge Escobar	1d399d3ca9	Tox Security Fix (#1242 ) * checkpoint * safety ignore * update python matrix for data checks * downloading census once	2022-02-03 17:05:51 -05:00
dependabot[bot]	8b72f743e3	Bump pillow from 8.4.0 to 9.0.0 in /data/data-pipeline (#1136 ) * Bump pillow from 8.4.0 to 9.0.0 in /data/data-pipeline Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.4.0 to 9.0.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/8.4.0...9.0.0) --- updated-dependencies: - dependency-name: pillow dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> * pillow bump Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>	2022-01-27 18:19:49 -05:00
dependabot[bot]	4a83ae458e	Bump ipython from 7.28.0 to 7.31.1 in /data/data-pipeline (#1169 ) Bumps [ipython](https://github.com/ipython/ipython) from 7.28.0 to 7.31.1. - [Release notes](https://github.com/ipython/ipython/releases) - [Commits](https://github.com/ipython/ipython/compare/7.28.0...7.31.1) --- updated-dependencies: - dependency-name: ipython dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-01-27 17:36:14 -05:00
Shaun Verch	4cec1bb37e	Install and run pandas-vet (#1119 ) * Install and run pandas-vet This doesn't fix the errors, but it can give us a starting point for the discussion of which of these errors we care about. * Ignore the errors for now * Ignore eeoc.gov in link checker Sometimes it seems down from the perspective of github actions.	2022-01-13 13:17:30 -05:00
Shaun Verch	0abf04d6c2	Remove requirements.txt as a dependency (#1111 ) * Remove requirements.txt as a dependency This converts both docker and tox to use poetry, eliminating usage of requirements.txt in both flows. - In tox, uses the tox-poetry package which installs dependencies from the lockfile. - In docker, uses https://stackoverflow.com/questions/53835198/integrating-python-poetry-with-docker as a reference. * Don't copy pyproject.toml * Remove obsoleted docs about requirements.txt * Add --full-trace option to pytest * Fix liccheck liccheck works with requirements.txt, not with poetry, so there needs to be an extra translation step. * TEMP: Add WIP fix for pandas issue This is just to see if the github actions would pass once this fix gets merged, but it's being reviewed separately. * Revert "TEMP: Add WIP fix for pandas issue" This reverts commit 06e38e8cc77f5f3105c6e7a9449901db67aa1c82.	2022-01-10 16:43:56 -05:00
dependabot[bot]	9dc70d48a4	Bump lxml from 4.6.3 to 4.6.5 in /data/data-pipeline (#1043 ) Bumps [lxml](https://github.com/lxml/lxml) from 4.6.3 to 4.6.5. - [Release notes](https://github.com/lxml/lxml/releases) - [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt) - [Commits](https://github.com/lxml/lxml/compare/lxml-4.6.3...lxml-4.6.5) --- updated-dependencies: - dependency-name: lxml dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-12-13 16:41:50 -05:00
Vincent La	b0dbc90064	[ISS-723] Load Census Data for 4 Territories (#816 ) * Adding census decennial data for island territories	2021-11-09 16:32:46 -05:00
Jorge Escobar	1b17af84c8	Combine + Tilefy (#806 ) * init * score-post * added score csv s3 download; remore poetry cmds from readme * working census tile fetch * PR review * Github Actions Work	2021-11-01 18:05:05 -04:00
Shelby Switzer	d3a18352fc	Add pytest to tox run in CI/CD (#713 ) * Add pytest to tox run in CI/CD * Try fixing tox dependencies for pytest * update poetry to get ci/cd passing * Run poetry export with --dev flag to include dev dependencies such as pytest * WIP updating test fixtures to include PDF * Remove dev dependencies from reqs and add pytest to envlist to make build faster * passing score_post tests * Add pytest tox (#729) * Fix failing pytest * Fixes failing tox tests and updates requirements.txt to include dev deps * pickle protocol 4 Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov> Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov> Co-authored-by: Billy Daly <williamdaly422@gmail.com> Co-authored-by: Jorge Escobar <83969469+esfoobar-usds@users.noreply.github.com>	2021-09-22 13:47:37 -04:00
Vincent La	7709836a12	Ticket 355: Adding map to Urban vs Rural Census Tracts (#696 ) * Adding urban vs rural notebook * Adding new code * Adding settings * Adding usa.csv * Adding etl * Adding etl * Adding to etl_score * quick changes to notebook * Ensuring notebook can run * Adding urban vs rural notebook * Adding new code * Adding settings * Adding usa.csv * Adding etl * Adding etl * Adding to etl_score * quick changes to notebook * Ensuring notebook can run * adding urban to comparison tool * renaming file * adding urban rural to more comp tool outputs * updating requirements and poetry * Adding ej screen notebook * removing ej screen notebook since it's in justice40-tool-iss-719 Co-authored-by: La <ryy0@cdc.gov> Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>	2021-09-22 12:31:03 -04:00
Jorge Escobar	fc5ed37fca	dependabot bump pillow (#681 ) * dependabot bump pillow * updated poetry * adding encoding to file open	2021-09-14 17:28:59 -04:00
Nat Hillard	536a35d6a0	Data Unit Tests (#509 ) * Fixes #341 - As a J40 developer, I want to write Unit Tests for the ETL files, so that tests are run on each commit * Location bug * Adding Load tests * Fixing XLSX filename * Adding downloadable zip test * updating pickle * Fixing pylint warnings * Updte readme to correct some typos and reorganize test content structure * Removing unused schemas file, adding details to readme around pickles, per PR feedback * Update test to pass with Score D added to score file; update path in readme * fix requirements.txt after merge * fix poetry.lock after merge Co-authored-by: Shelby Switzer <shelby.switzer@cms.hhs.gov>	2021-09-10 14:17:34 -04:00
dependabot[bot]	f4ffcc6a53	Bump pillow from 8.3.1 to 8.3.2 in /data/data-pipeline (#625 ) Bumps [pillow](https://github.com/python-pillow/Pillow) from 8.3.1 to 8.3.2. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/8.3.1...8.3.2) --- updated-dependencies: - dependency-name: pillow dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2021-09-08 13:08:58 -04:00
Lucas Merrill Brown	65ceb7900f	Score F, testing methodology (#510 ) * fixing dependency issue * fixing more dependencies * including fraction of state AMI * wip * nitpick whitespace * etl working now * wip on scoring * fix rename error * reducing metrics * fixing score f * fixing readme * adding dependency * passing tests; * linting/black * removing unnecessary sample * fixing error * adding verify flag on etl/base Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>	2021-08-24 16:40:54 -04:00
Jorge Escobar	3d8dbb293c	Tile-baking columns with floating rounds completed (#491 ) * Tile-baking columns with floating rounds completed * completed * correction on github workflow * tiles folder no longer needed * addressed comments * updating requirements.txt * poetry lock update * adding xlswriter * final poetrylock * updated requirements.txt * checkpoint * removed matplotlib * ignoring pylint too many statements * reinstated too many statements * converting data sync to generate score GHA UI-driven	2021-08-10 15:28:50 -04:00
Nat Hillard	9a9d5fdf7f	Backend change for Zipfile pt. 2 (#469 ) * Fixes #303 : adding downloadable zip archive logic * linter recommendations * Pushes data directory to AWS. We'll want to move to use AWS for this ASAP, but this works for now * updating pattern	2021-08-09 10:39:59 -04:00
Nat Hillard	c1568e87c0	Data directory should adopt standard Poetry-suggested python package structure (#457 ) * Fixes #456 - Our data directory should adopt standard python package structure * a few missed references * updating readme * updating requirements * Running Black * Fixes for flake8 * updating pylint	2021-08-05 15:35:54 -04:00
Billy Daly	5504528fdf	Issue 308 python linting (#443 ) * Adds flake8, pylint, liccheck, flake8 to dependencies for data-pipeline * Sets up and runs black autoformatting * Adds flake8 to tox linting * Fixes flake8 error F541 f string missing placeholders * Fixes flake8 E501 line too long * Fixes flake8 F401 imported but not used * Adds pylint to tox and disables the following pylint errors: - C0114: module docstrings - R0201: method could have been a function - R0903: too few public methods - C0103: name case styling - W0511: fix me - W1203: f-string interpolation in logging * Adds utils.py to tox.ini linting, runs black on utils.py * Fixes import related pylint errors: C0411 and C0412 * Fixes or ignores remaining pylint errors (for discussion later) * Adds safety and liccheck to tox.ini	2021-08-02 12:16:38 -04:00
Billy Daly	55dabb2b57	Issue 379 tox setup (#405 ) * Adds tox as a dev dependency to data/data-pipeline/pyproject.toml: Also updates poetry.lock and requirements.txt * Adds tox.ini to test build of data/data-pipeline * Sets up GitHub actions workflow for data/ directory * Tries to get Data Checks GitHub action to run * Fixes error with GitHub action * Migrates data/data-roadmap from setuptools to poetry * Sets up tox file for data/data-roadmap * Adds github action for data/data-roadmap * Fixes syntax error in data-checks.yml * Second attempt at fixing data-checks.yml * Export poetry requirements to requirements.txt * Revert "Migrates data/data-roadmap from setuptools to poetry" This reverts commit e8367652d43c1c9beee500f792c8f41e1c1fc462. * Removes pyproject.toml and reverts requirements.txt as well	2021-07-29 14:00:20 -04:00
Nat Hillard	a7cdf1c021	Adding notebook to create score dissolve (#333 )	2021-07-21 16:10:32 -04:00
Jorge Escobar	543d147e61	Data folder restructuring in preparation for 361 (#376 ) * initial checkin * gitignore and docker-compose update * readme update and error on hud * encoding issue * one more small README change * data roadmap re-strcuture * pyproject sort * small update to score output folders * checkpoint * couple of last fixes	2021-07-20 14:55:39 -04:00

38 commits