Commit graph

31 commits

Author SHA1 Message Date
Jorge Escobar
ccd72e2cce
tribal tiles fix (#1874)
* Alaska tribal points fix (#1821)

* tribal tiles fix

* disabling child opportunity

* lint

* removing COI

* removing commented out code
2022-09-01 10:19:13 -04:00
Lucas Merrill Brown
4bf7773797
Issue 1827: Add demographics to tiles and download files (#1833)
* Adding demographics for use in sidebar and download files
2022-08-22 10:05:23 -04:00
Emma Nechamkin
7d89d41e49
Adding NLCD data (#1826)
Adding NLCD's natural space indicator end to end to the score.
2022-08-17 14:21:28 -04:00
Matt Bowen
49623e4da0
Add abandoned mine lands data (#1824)
* Add notebook to generate test data (#1780)

* Add Abandoned Mine Land data (#1780)

Using a similar structure but simpler apporach compared to FUDs, add an
indicator for whether a tract has an abandonded mine.

* Adding some detail to dataset readmes

Just a thought!

* Apply feedback from revieiw (#1780)

* Fixup bad string that broke test (#1780)

* Update a string that I should have renamed (#1780)

* Reduce number of threads to reduce memory pressure (#1780)

* Try not running geo data (#1780)

* Run the high-memory sets separately (#1780)

* Actually deduplicate (#1780)

* Add flag for memory intensive ETLs (#1780)

* Document new flag for datasets (#1780)

* Add flag for new datasets fro rebase (#1780)

Co-authored-by: Emma Nechamkin <97977170+emma-nechamkin@users.noreply.github.com>
2022-08-17 11:33:59 -04:00
Emma Nechamkin
5e378aea81
Adding first street foundation data (#1823)
Adding FSF flood and wildfire risk datasets to the score.
2022-08-17 10:14:23 -04:00
Emma Nechamkin
ebac552d75
Adding DOT composite to travel score (#1820)
This adds the DOT dataset to the ETL and to the score. Note that currently we take a percentile of an average of percentiles.
2022-08-16 14:44:39 -04:00
Matt Bowen
d5fbb802e8
Add FUDS ETL (#1817)
* Add spatial join method (#1871)

Since we'll need to figure out the tracts for a large number of points
in future tickets, add a utility to handle grabbing the tract geometries
and adding tract data to a point dataset.

* Add FUDS, also jupyter lab (#1871)

* Add YAML configs for FUDS (#1871)

* Allow input geoid to be optional (#1871)

* Add FUDS ETL, tests, test-datae noteobook (#1871)

This adds the ETL class for Formerly Used Defense Sites (FUDS). This is
different from most other ETLs since these FUDS are not provided by
tract, but instead by geographic point, so we need to assign FUDS to
tracts and then do calculations from there.

* Floats -> Ints, as I intended (#1871)

* Floats -> Ints, as I intended (#1871)

* Formatting fixes (#1871)

* Add test false positive GEOIDs (#1871)

* Add gdal binaries (#1871)

* Refactor pandas code to be more idiomatic (#1871)

Per Emma, the more pandas-y way of doing my counts is using np.where to
add the values i need, then groupby and size. It is definitely more
compact, and also I think more correct!

* Update configs per Emma suggestions (#1871)

* Type fixed! (#1871)

* Remove spurious import from vscode (#1871)

* Snapshot update after changing col name (#1871)

* Move up GDAL (#1871)

* Adjust geojson strategy (#1871)

* Try running census separately first (#1871)

* Fix import order (#1871)

* Cleanup cache strategy (#1871)

* Download census data from S3 instead of re-calculating (#1871)

* Clarify pandas code per Emma (#1871)
2022-08-16 13:28:39 -04:00
Emma Nechamkin
1782d022a9 Adding HOLC indicator (#1579)
Added HOLC indicator (Historic Redlining Score) from NCRC work; included 3.25 cutoff and low income as part of the housing burden category.
2022-08-11 12:33:46 -04:00
Emma Nechamkin
f047ca9d83 Imputing income using geographic neighbors (#1559)
Imputes income field with a light refactor. Needs more refactor and more tests (I spotchecked). Next ticket will check and address but a lot of "narwhal" architecture is here.
2022-08-11 12:33:45 -04:00
Jorge Escobar
8149ac31c5
Starting Tribal Boundaries Work (#1736)
* starting tribal pr

* further pipeline work

* bia merge working

* alaska villages and tribal geo generate

* tribal folders

* adding data full run

* tile generation

* tribal tile deploy
2022-07-30 01:13:10 -04:00
Lucas Merrill Brown
a0d6e55f0a
Run ETL processes in parallel (#1253)
* WIP on parallelizing

* switching to get_tmp_path for nri

* switching to get_tmp_path everywhere necessary

* fixing linter errors

* moving heavy ETLs to front of line

* add hold

* moving cdc places up

* removing unnecessary print

* moving h&t up

* adding parallel to geo post

* better census labels

* switching to concurrent futures

* fixing output
2022-02-11 14:04:53 -05:00
Emma Nechamkin
6a00b29f5d
Adding VA and CO ETL from mapping for environmental justice (#1177)
Adding the mapping for environmental justice data, which contains information about VA and CO, to the ETL pipeline.
2022-02-04 10:00:41 -05:00
Lucas Merrill Brown
18f299c5f8
Issue 1141: Definition M (#1151) 2022-01-18 14:56:55 -05:00
Saran Ahluwalia
87e08f5fe1
CDC SVI Index: Additions to data-pipeline and comparison tool (#1096)
* wip

* working

* working

* rename

* documentation

* add link

* add readme

* update fieldnames

* typo

* add comparison tool

* revise wording

* variable change for FIPS

* workding

* wording in readme

* cleanup wording

* update comparison tool

* final tune up

* grammar and punctuation in the documentation

* period

* cleanup comments

* added revisions

* parallelism

* PR feedback from Lucas

* remove extraneous fields from comparison tool

* style

* updates

* remove themes

* formatting

* remove referenes to percentile rank

* remove referenes to percentile rank

* typo in fieldnames

* updates based on feedback from Lucas

* fieldnames formatting

* fix broken markdown link

Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2022-01-14 14:52:37 -05:00
Saran Ahluwalia
95a14adb35
Added Census Tract Aggregated Micro-data from EPA Risk-Screening Environmental Indicators (RSEI) model (#1101)
* added initial source code - todo is comparison tool

* added values

* rename fields

* check geoid

* added black

* added revisions

* added clean up to comments

* more comments

* formatting

* cleanup and address PR feedback

* fix changes

* final path changes

* style

* PR feedback

* added final PR comment

* fix flake 8

* add revisions
2022-01-14 13:50:49 -05:00
Saran Ahluwalia
a98ea35f74
Maryland EJSCREEN Addition to comparison tool (#1143)
* finalized

* cleanup notebook

* cleanup

* run black
2022-01-14 13:26:48 -05:00
Saran Ahluwalia
a4137fdc98
Add Michigan EJ Screen into data-pipeline's ETL and provide automated scoring and statistics outputs (#1091)
* draft wip

* initial commit

* clear output from notebook

* revert to 65ceb7900f

* draft wip

* initial commit

* clear output from notebook

* revert to 65ceb7900f

* make michigan prefix for readable

* standardize Michigan names and move all constants from class into field names module

* standardize Michigan names and move all constants from class into field names module

* include only pertinent columns for scoring comparison tool

* michigan EJSCREEN standardization

* final PR feedback

* added exposition and summary of Michigan EJSCREEN

* added exposition and summary of Michigan EJSCREEN

* fix typo

Co-authored-by: Saran Ahluwalia <ahlusar.ahluwalia@gmail.com>
2021-12-31 15:38:52 -05:00
Lucas Merrill Brown
beb0eea5cc
Alternative definition of DACs for comparison (#1068)
* Alternative energy-related definition of DACs
2021-12-27 12:05:59 -05:00
Lucas Merrill Brown
5a6d6d8557
Issue 954: Add various data sources from Child Opportunity Index (#986)
* Adds four fields:
    * Summer days above 90F
    * Percent low access to healthy food
    * Percent impenetrable surface areas
    * Low third grade reading proficiency

* Each of these four gets added into Definition L in various factors.

* Additionally, I add college attendance fields to the ETL for Census ACS.

* This PR also introduces the notion of "reverse percentiles", relevant to ticket #970.
2021-12-07 11:33:49 -05:00
Lucas Merrill Brown
c5dff6e5f7
Issue 242: Add HOLC Grades to data inputs (#978)
* Add mapping inequality data to data inputs

* Add mapping inequality data to comparison tool
2021-12-04 12:23:01 -05:00
Lucas Merrill Brown
1d101c93d2
Issue 844: Add island areas to Definition L (#957)
This ended up being a pretty large task. Here's what this PR does:

1. Pulls in Vincent's data from island areas into the score ETL. This is from the 2010 decennial census, the last census of any kind in the island areas.
2. Grabs a few new fields from 2010 island areas decennial census.
3. Calculates area median income for island areas.
4. Stops using EJSCREEN as the source of our high school education data and directly pulls that from census (this was related to this project so I went ahead and fixed it).
5. Grabs a bunch of data from the 2010 ACS in the states/Puerto Rico/DC, so that we can create percentiles comparing apples-to-apples (ish) from 2010 island areas decennial census data to 2010 ACS data. This required creating a new class because all the ACS fields are different between 2010 and 2019, so it wasn't as simple as looping over a year parameter.
6. Creates a combined population field of island areas and mainland so we can use those stats in our comparison tool, and updates the comparison tool accordingly.
2021-12-03 15:46:10 -05:00
Lucas Merrill Brown
a4108d24c0 Issue 919: Fix too many tracts issue (#922)
* Some cleanup, adding error warning to merge function

* Error handling around tract merge
2021-11-30 13:49:20 -05:00
Vincent La
b0dbc90064
[ISS-723] Load Census Data for 4 Territories (#816)
* Adding census decennial data for island territories
2021-11-09 16:32:46 -05:00
Lucas Merrill Brown
1d541be447
Add EJSCREEN Areas of Concern (#843)
* Adding ej screen areas of concern

* Uses it where user has local files, but not otherwise

Co-authored-by: VincentLaUSDS <vincent.la@omb.eop.gov>
2021-11-02 15:38:42 -04:00
Lucas Merrill Brown
b1a4d26be8
Adding persistent poverty tracts (#738)
* persistent poverty working

* fixing left-padding

* running black and adding persistent poverty to comp tool

* fixing bug

* running black and fixing linter

* fixing linter

* fixing linter error
2021-09-22 17:57:08 -04:00
Vincent La
7709836a12
Ticket 355: Adding map to Urban vs Rural Census Tracts (#696)
* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* Adding urban vs rural notebook

* Adding new code

* Adding settings

* Adding usa.csv

* Adding etl

* Adding etl

* Adding to etl_score

* quick changes to notebook

* Ensuring notebook can run

* adding urban to comparison tool

* renaming file

* adding urban rural to more comp tool outputs

* updating requirements and poetry

* Adding ej screen notebook

* removing ej screen notebook since it's in justice40-tool-iss-719

Co-authored-by: La <ryy0@cdc.gov>
Co-authored-by: lucasmbrown-usds <lucas.m.brown@omb.eop.gov>
2021-09-22 12:31:03 -04:00
Lucas Merrill Brown
e94d05882c
Issue 675 & 676: Adding life expectancy and DOE energy burden data (#683)
* Adding two new data sources.
2021-09-15 09:59:28 -05:00
Lucas Merrill Brown
7d13be7651
Ticket 492: Integrate Area Median Income and Poverty measures into ETL (#660)
* Loading AMI and poverty data
2021-09-13 15:36:35 -05:00
Billy Daly
f0900f7b69
Adds National Risk Index data to ETL pipeline (#549)
* Adds dev dependencies to requirements.txt and re-runs black on codebase

* Adds test and code for national risk index etl, still in progress

* Removes test_data from .gitignore

* Adds test data to nation_risk_index tests

* Creates tests and ETL class for NRI data

* Adds tests for load() and transform() methods of NationalRiskIndexETL

* Updates README.md with info about the NRI dataset

* Adds to dos

* Moves tests and test data into a tests/ dir in national_risk_index

* Moves tmp_dir for tests into data/tmp/tests/

* Promotes fixtures to conftest and relocates national_risk_index tests:
The relocation of national_risk_index tests is necessary because tests 
can only use fixtures specified in conftests within the same package

* Fixes issue with df.equals() in test_transform()

* Files reformatted by black

* Commit changes to other files after re-running black

* Fixes unused import that caused lint checks to fail

* Moves tests/ directory to app root for data_pipeline
2021-09-07 20:51:34 -04:00
Lucas Merrill Brown
65ceb7900f
Score F, testing methodology (#510)
* fixing dependency issue

* fixing more dependencies

* including fraction of state AMI

* wip

* nitpick whitespace

* etl working now

* wip on scoring

* fix rename error

* reducing metrics

* fixing score f

* fixing readme

* adding dependency

* passing tests;

* linting/black

* removing unnecessary sample

* fixing error

* adding verify flag on etl/base

Co-authored-by: Jorge Escobar <jorge.e.escobar@omb.eop.gov>
2021-08-24 16:40:54 -04:00
Nat Hillard
ec19d86f6f
Adding back census to list of potential datasets, but separating out from standard list (#484)
Error this addresses:
  File "/Users/lucas/Documents/usds/repos/justice40-tool/data/data-pipeline/data_pipeline/etl/runner.py", line 71, in etl_runner
    f"data_pipeline.etl.sources.{dataset['module_dir']}.etl"
TypeError: 'NoneType' object is not subscriptable
2021-08-09 09:52:06 -04:00