mirror of
https://github.com/DOI-DO/j40-cejst-2.git
synced 2025-02-20 08:41:26 -08:00
[SPIKE] Improve backend documentation (#2177)
* Update code owners to include new folks and remove the departed ones * Update maintainers to reflect the current personnel * Update contributing with the latest, and make small changes to readme to make it easier to read * Update maintainers with Lucas Brown * Update installation guide to refine instructions and make them easier to follow * Try emojis to make notes stand out more * Experiment with note * Moved installation of data pipeline into a new file (contents TBD), and redid most part of the data pipeline README for clarity and readability * Add mermaid diagram * Fix table * Update readme for clarity and correctness * Update TOC * Fix comparator doc * Add section on internal score comparison * Move tox information from installation to testing * Update installation for data pipeline * Add emojis to make picking out platform-specific instructions easier * Fix Git caps * Update for readability * Add direct link to VS Code instructions * Fix broken link and improve readability * Update installation for clarity and proper case * Update python text * Clean up information about poetry and poetry lockfiles * Remove duplicate paragraph * Fix case * update date table * re-adjust table to put links at the end * Fix a few minor typos --------- Co-authored-by: Sam Powers <121890478+sampowers-usds@users.noreply.github.com>
This commit is contained in:
parent
79c223b646
commit
c3a68cb251
13 changed files with 498 additions and 481 deletions
2
.github/CODEOWNERS
vendored
2
.github/CODEOWNERS
vendored
|
@ -1 +1 @@
|
|||
* @esfoobar-usds @vim-usds @emma-nechamkin @mattbowen-usds
|
||||
* @vim-usds @travis-newby @sampowers-usds @mattbowen-usds
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
# Colaboraciones con la herramienta Justice40
|
||||
|
||||
*[Read this in English!](CONTRIBUTING.md)*
|
||||
_[Read this in English!](CONTRIBUTING.md)_
|
||||
|
||||
🎉 Primeramente, agradecemos el tiempo que dedica a su colaboración. 🎉
|
||||
|
||||
|
@ -9,6 +9,7 @@ Siga las instrucciones siguientes para colaborar con la herramienta Justice40 qu
|
|||
Antes de hacer su aportación, recomendamos que lea además nuestros archivos [LICENCIA](LICENSE-es.md) y [LÉAME](README-es.md), que también se incluyen aquí. Si el contenido de este repositorio no responde a sus preguntas, no dude en [contactar con nosotros](mailto:justice40open@usds.gov).
|
||||
|
||||
## Dominio público
|
||||
|
||||
Este proyecto es de dominio público en los Estados Unidos, y se renuncia a los derechos de autor y derechos conexos sobre la obra en todo el mundo por medio de la [CC0 1.0 Universal, Dedicación de dominio público](https://creativecommons.org/publicdomain/zero/1.0/).
|
||||
|
||||
Todas las aportaciones que se hagan a este proyecto se divulgarán conforme a la dedicación CC0. Al presentar una solicitud de incorporación de cambios, usted acepta cumplir con esta renuncia a los intereses de derechos de autor.
|
||||
|
@ -16,18 +17,21 @@ Todas las aportaciones que se hagan a este proyecto se divulgarán conforme a la
|
|||
## ¿Cómo puedo colaborar?
|
||||
|
||||
### Informe de un error
|
||||
|
||||
Si cree que encontró un error en la herramienta Justice40, consulte nuestra lista de problemas en GitHub para saber si ya se abrió uno similar.
|
||||
|
||||
Cuando informe de un error, siga estas instrucciones:
|
||||
* **Use la plantilla de informe de error**, Bug Report ([aquí](https://github.com/usds/justice40-tool/issues/new/choose), para comunicar el problema. La plantilla se llena con la información adecuada.
|
||||
* **Use un título claro y descriptivo del problema** para definirlo.
|
||||
* **Describa los pasos exactos para reproducir el problema** con todos los detalles posibles. Por ejemplo, comience con una explicación de cómo llegó a la página donde encontró el error.
|
||||
* **Describa el comportamiento que observó después de seguir esos pasos** e indique exactamente cuál es el problema que causa ese comportamiento.
|
||||
* **Explique cuál era el comportamiento que esperaba ver y por qué.**
|
||||
* **Incluya capturas de pantalla y GIF animados** si es posible, en los que se vean los pasos descritos que siguió y muestren claramente el problema.
|
||||
* **Si el problema no lo desencadenó una acción específica**, describa lo que estaba haciendo cuando se presentó el problema.
|
||||
|
||||
- **Use la plantilla de informe de error**, Bug Report ([aquí](https://github.com/usds/justice40-tool/issues/new/choose), para comunicar el problema. La plantilla se llena con la información adecuada.
|
||||
- **Use un título claro y descriptivo del problema** para definirlo.
|
||||
- **Describa los pasos exactos para reproducir el problema** con todos los detalles posibles. Por ejemplo, comience con una explicación de cómo llegó a la página donde encontró el error.
|
||||
- **Describa el comportamiento que observó después de seguir esos pasos** e indique exactamente cuál es el problema que causa ese comportamiento.
|
||||
- **Explique cuál era el comportamiento que esperaba ver y por qué.**
|
||||
- **Incluya capturas de pantalla y GIF animados** si es posible, en los que se vean los pasos descritos que siguió y muestren claramente el problema.
|
||||
- **Si el problema no lo desencadenó una acción específica**, describa lo que estaba haciendo cuando se presentó el problema.
|
||||
|
||||
### Sugiera un mejoramiento
|
||||
|
||||
Si desea sugerir un cambio, solicitar una función o tratar un asunto, pero no tiene el lenguaje o el código específico para enviar su solicitud, puede abrir un problema en este repositorio.
|
||||
|
||||
Abra [aquí](https://github.com/usds/justice40-tool/issues/new/choose) un problema de tipo "Solicitud de función".
|
||||
|
@ -35,11 +39,14 @@ Abra [aquí](https://github.com/usds/justice40-tool/issues/new/choose) un proble
|
|||
En el problema, describa la función que desea, por qué la necesita y cómo debería funcionar esta. Los integrantes del equipo responderán a este lo antes posible.
|
||||
|
||||
### Colaboraciones con código
|
||||
<!-- markdown-link-check-disable -->
|
||||
Si desea colaborar con alguna parte del código base, bifurque el repositorio siguiendo la [siguiente metodología de GitHub para hacer la bifurcación](https://docs.github.com/es/get-started/quickstart/fork-a-repo) *(en inglés)*. Luego, haga los cambios en el código de su propia copia del repositorio (incluya pruebas, si corresponde) y, por último, [envíe una solicitud para incorporar los cambios en el repositorio precedente](https://docs.github.com/es/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) *(en inglés)*. Para poder fusionar una solicitud de incorporación de cambios, las siguientes opciones en este repositorio deben estar habilitadas:
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
|
||||
* No se permiten las fusiones al proyecto principal `main`; abra una solicitud de incorporación de cambios desde una rama.
|
||||
* Al menos un revisor autorizado debe aprobar la confirmación (en [CODEOWNERS](https://github.com/usds/justice40-tool/tree/main/.github/CODEOWNERS), en inglés, consulte la lista más reciente de estos revisores).
|
||||
* Todas las verificaciones de estado obligatorias deben ser aprobadas.
|
||||
Si hay un desacuerdo importante entre los integrantes del equipo, se organizará una reunión con el fin de determinar el plan de acción para la solicitud de incorporación de cambios.
|
||||
Si desea colaborar con alguna parte del código base, bifurque el repositorio siguiendo la [siguiente metodología de GitHub para hacer la bifurcación](https://docs.github.com/es/get-started/quickstart/fork-a-repo) _(en inglés)_. Luego, haga los cambios en el código de su propia copia del repositorio (incluya pruebas, si corresponde) y, por último, [envíe una solicitud para incorporar los cambios en el repositorio precedente](https://docs.github.com/es/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) _(en inglés)_. Para poder fusionar una solicitud de incorporación de cambios, las siguientes opciones en este repositorio deben estar habilitadas:
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
|
||||
- No se permiten las fusiones al proyecto principal `main`; abra una solicitud de incorporación de cambios desde una rama.
|
||||
- Al menos un revisor autorizado debe aprobar la confirmación (en [CODEOWNERS](https://github.com/usds/justice40-tool/tree/main/.github/CODEOWNERS), en inglés, consulte la lista más reciente de estos revisores).
|
||||
- Todas las verificaciones de estado obligatorias deben ser aprobadas.
|
||||
Si hay un desacuerdo importante entre los integrantes del equipo, se organizará una reunión con el fin de determinar el plan de acción para la solicitud de incorporación de cambios.
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
# Contributing to the Justice40 Tool
|
||||
|
||||
*[¡Lea esto en español!](CONTRIBUTING-es.md)*
|
||||
_[¡Lea esto en español!](CONTRIBUTING-es.md)_
|
||||
|
||||
🎉 First off, thanks for taking the time to contribute! 🎉
|
||||
|
||||
The following is a set of guidelines for contributing to the Justice40 Tool that lives in this repository.
|
||||
The following is a set of guidelines for contributing to the Justice40 Climate and Economic Justice Screening Tool (CEJST) that lives in this repository.
|
||||
|
||||
Before contributing, we encourage you to also read our [LICENSE](LICENSE.md) and [README](README.md) files, also found in this repository. If you have any questions not answered by the content of this repository, please don't hesitate to [contact us](mailto:justice40open@usds.gov).
|
||||
Before contributing, we encourage you to read our [LICENSE](LICENSE.md) and [README](README.md) files. If you have any questions not answered by the content of this repository, please [contact us](mailto:justice40open@usds.gov).
|
||||
|
||||
## Public Domain
|
||||
|
||||
|
@ -16,32 +16,34 @@ All contributions to this project will be released under the CC0 dedication. By
|
|||
|
||||
## How Can I Contribute?
|
||||
|
||||
### Report a bug
|
||||
### Report a Bug
|
||||
|
||||
If you think you have found a bug in the Justice40 tool, search our issues list on GitHub in case a similar issue has already been opened.
|
||||
If you think you have found a bug in the Justice40 tool, search our issues list on GitHub for any similar bugs. If you find a similar bug, please update that issue with your details.
|
||||
|
||||
When reporting the bug, please follow these guidelines:
|
||||
If you do not find your bug in our issues list, file a bug report. When reporting the bug, please follow these guidelines:
|
||||
|
||||
- **Please use the `Bug Report` issue template** ([here](https://github.com/usds/justice40-tool/issues/new/choose)). This is populated with the right information
|
||||
- **Use a clear and descriptive issue title** for the issue to identify the problem.
|
||||
- **Describe the exact steps which reproduce the problem** in as many details as possible. For example, start by explaining how you got to the page where you encountered the bug.
|
||||
- **Describe the exact steps to reproduce the problem** in as much detail as possible. For example, start by explaining how you got to the page where you encountered the bug.
|
||||
- **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior.
|
||||
- **Explain which behavior you expected to see instead and why.**
|
||||
- **Include screenshots and animated GIFs** if possible, which show you following the described steps and clearly demonstrate the problem.
|
||||
- **If the problem wasn't triggered by a specific action**, describe what you were doing before the problem happened.
|
||||
|
||||
### Suggest an enhancement
|
||||
### Suggest an Enhancement
|
||||
|
||||
If you don't have specific language or code to submit but would like to suggest a change, request a feature,
|
||||
or have something addressed, you can open an issue in this repository.
|
||||
If you don't have specific language or code to submit but would like to suggest a change, request a feature, or have something addressed, you can open an issue in this repository.
|
||||
|
||||
Please open an issue of type "Feature request" [here](https://github.com/usds/justice40-tool/issues/new/choose).
|
||||
Please open an issue of type `Feature request` [here](https://github.com/usds/justice40-tool/issues/new/choose).
|
||||
|
||||
In the issue, please describe the feature you would like to see, why you need it, and how it should work. Team members will respond to the issue as soon as possible.
|
||||
In this issue, please describe the feature you would like to see, why you need it, and how it should work. Team members will respond to the Feature request as soon as possible.
|
||||
|
||||
### Contribute to the Code
|
||||
|
||||
### Code contributions
|
||||
<!-- markdown-link-check-disable -->
|
||||
If you would like to contribute to any part of the codebase, please fork the repository [following the Github forking methodology](https://docs.github.com/en/github/getting-started-with-github/quickstart/fork-a-repo). Then, make changes to the code in your own copy of the repository, including tests if applicable, and finally [submit a pull request against the upstream repo.](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) In order for us to merge a pull request, the following checks are enabled within this repo:
|
||||
|
||||
If you would like to contribute to any part of the codebase, please fork the repository [following the Github forking methodology](https://docs.github.com/en/github/getting-started-with-github/quickstart/fork-a-repo). Make changes to the code in your own copy of the repository – including tests if applicable – and [submit a pull request against the upstream repo.](https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork) In order for us to merge a pull request, the following checks are enabled within this repo:
|
||||
|
||||
<!-- markdown-link-check-enable -->
|
||||
|
||||
- Merges to `main` are prohibited - please open a pull request from a branch
|
||||
|
|
48
DATASETS.md
48
DATASETS.md
|
@ -1,16 +1,38 @@
|
|||
# Justice40 Datasets
|
||||
|
||||
This page contains web links to the datasets that are uploaded as part of our data pipeline, if you want to access them directly. Note that this is just a quick reference and the [Data Pipeline README](/data/data-pipeline/README.md) has the comprehensive documentation on all these datasets.
|
||||
Below is a table of all datasets that feed the CEJST application, including access links and contacts.
|
||||
|
||||
This page contains web links to the datasets that are uploaded as part of our data pipeline if you want to access them directly. Note that this is just a quick reference and the [Data Pipeline README](/data/data-pipeline/README.md) has comprehensive documentation on all these datasets.
|
||||
|
||||
> Note: These currently aren't updated on any specific cadence, so be aware of this if you know that the dataset you are using might change frequently.
|
||||
|
||||
- Census data, generated by the [Generate Census Github Action](https://github.com/usds/justice40-tool/blob/main/.github/workflows/generate-census.yml): <https://justice40-data.s3.us-east-1.amazonaws.com/data-sources/census.zip>
|
||||
- Score data, generated by the [Generate Score Github Action](https://github.com/usds/justice40-tool/blob/main/.github/workflows/generate-score.yml): <https://justice40-data.s3.us-east-1.amazonaws.com/data-pipeline/data/score/csv/full/usa.csv>
|
||||
- GeoJSON data: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/score/geojson/usa-high.json>
|
||||
- Shapefile data: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/score/shapefile/usa.zip>
|
||||
- EJScreen: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/ejscreen_2019/usa.csv>
|
||||
- Census ACS 2019: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/census_acs_2019/usa.csv>
|
||||
- Housing and Transportation Index: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/housing_and_transportation_index/usa.csv>
|
||||
- HUD Housing: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/hud_housing/usa.csv>
|
||||
| **Indicator Group** | **Indicator** | **Description** | **Notes** | **Publisher** | **Year(s)** | **Source** | **Geography** | **Geographies available** | **Can be updated to 2020 Census Tracts?** | **Contact** | **Current Data Download** |
|
||||
| ------------------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------- | ---------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------ | --------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **Climate Change** | Expected Agriculture Loss Rate | Economic loss to agricultural value resulting from natural hazards each year | | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Data to be released End of March 2023 | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Expected Building Loss Rate | Economic loss to building value resulting from natural hazards each year | | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Data to be released End of March 2023 | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Expected Population Loss Rate | fatalities and injuries resulting from natural hazards each year | this burden only applies for census tracts with populations greater than 20 people. | Federal Emergency Management Agency | 2014-2021 | National Risk Index | US & District of Columbia | 2010 Census Tract | Data to be released End of March 2023 | Caset Zuzak, NHRAP Senior Risk Analyst. (Casey.Zuzak@fema.dhs.gov); Karen Villatoro (karen.villatoro@fema.dhs.gov); Jesse Rozelle (Jesse.Rozelle@fema.dhs.gov); Sean McNabb (Sean.McNabb@fema.dhs.gov); Charles Carson (charles.carson@fema.dhs.gov) | https://hazards.fema.gov/nri/data-resources |
|
||||
| **Climate Change** | Projected Flood Risk | projected risk to properties from projected floods, from tides, raid, riverine and storm surges within 30 years | these were emailed to J40 initially | First Street Foundation | projecting 2022-2052. Released in 2020 | | 50 states, DC, PR | 2010 Census Tract (we think, but documentation does not say) | Updated is available | Ed Kearns, Chief Data Officer of First Street Foundation. (ed@firststreet.org) | https://aws.amazon.com/marketplace/pp/prodview-r36lzzzjacd32?sr=0-1&ref_=beagle&applicationId=AWSMPContessa#overview |
|
||||
| **Climate Change** | Projected Wildfire Risk | projected risk to properties form wildfire from fire fuels, weather, humans, and fire movement in 30 years | these were emailed to J40 initially | First Street Foundation | projecting 2022-2052. Released in 2020 | | 50 states, DC, PR | 2010 Census Tract (we think, but documentation does not say) | Updated is available | Ed Kearns, Chief Data Officer of First Street Foundation. (ed@firststreet.org) | https://aws.amazon.com/marketplace/pp/prodview-r36lzzzjacd32?sr=0-1&ref_=beagle&applicationId=AWSMPContessa#overview |
|
||||
| **Energy** | Energy Cost | Average annual energy costs divided by household income | | DOE | 2018 | LEAD Tool | 50 states, DC, PR | Census 2010 | Yes, in March 2023 | Aaron Vimont, developer at National Renewable Energy Laboratory. (aaron.vimont@nrel.gov); Toy Reames (tony.reames@hq.doe.gov) | "https://data.openei.org/submissions/573 https://www.energy.gov/scep/slsc/lead-tool |
|
||||
| **Energy** | PM2.5 in the air | level of inhalable particles, 2.5 micrometers or smaller | | Environmental Protection Agency (EPA) Office of Air and Radiation (OAR) | 2017 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Health** | Asthma | Share of people who have been told they have asthma | New Data Source: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Not until December 2024 | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti |
|
||||
| **Health** | Diabetes | Share of people ages 18+ who have diabetes other than diabetes during pregnancy | New Data Source: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Not until December 2024 | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti |
|
||||
| **Health** | Heart Disease | Share of people ages 18+ who have been told they have heart disease | New Data Source: https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Local-Data-for-Better-Health-Census-Tract-D/cwsq-ngmh | CDC | 2016-2019 | PLACES data | 50 States + DC | Census 2010 Tracts, Census 2010 & 2020 Counties | Not until December 2024 | T.J. Pierce (pwc2@cdc.gov); Sharunda Buchanan (sdb4@cdc.gov); Andrew Dent (aed5@cdc.gov) Angela Werner (myo6@cdc.gov) | https://chronicdata.cdc.gov/500-Cities-Places/PLACES-Census-Tract-Data-GIS-Friendly-Format-2021-/mb5y-ytti |
|
||||
| **Health** | Low life expectancy | Average number of years a person can expect to live | | CDC | 2010-2015 | US Small Area Life Expectancy Estimates Project | 50 States + DC | Census 2010 | 2025 | Elizabeth Arias (efa3@cdc.gov) | https://www.cdc.gov/nchs/nvss/usaleep/usaleep.html#life-expectancy |
|
||||
| **Housing** | Historic Underinvestment | Census tracts that experienced historic underinvestment based on redlining maps created by the federal government’s Home Owners’ Loan Corporation (HOLC) between 1935 and 1940. | | National Community Reinvestment Coalition (NCRC) | | Home Owners Loan Corporation | 50 States + DC | Census 2010 & 2020 | Yes | | https://www.openicpsr.org/openicpsr/project/141121/version/V2/view |
|
||||
| **Housing** | Housing Cost | Share of households making less than 80% of the AMI and spending more than 30% of income on housing | maybe could be found in ACS | Department of Housing and Urban Development (HUD) | 2014-2018 | Comprehensive housing affordability strategy dataset | 50 States + DC+ PR | Census 2010 | Early Summer 2023 | Blair Russell, Office of Policy Development and Research; HUD (Blair.D.Russell@hud.gov) | https://www.huduser.gov/portal/datasets/cp.html#2006-2019_data |
|
||||
| **Housing** | Lack of Green Space | Amount of land, not including crop land, that is covered with artificial materials like concrete or pavement | | Multi-Resolution Land Characteristics Consortium | 2019 | National Land Cover Database (USGS) | 48 States + DC | Possibly not bound to geographies because its raster data. TPL imputed to census 2010 for us, I think. | Maybe? Use same data but pre process to Census 2020 | | Was provided by the trust for public land but you can also get it here as image data https://www.sciencebase.gov/catalog/item/5f21cef582cef313ed940043 |
|
||||
| **Housing** | Lack of Indoor Plumbing | Share of homes without indoor kitchens or plumbing | maybe could be found in ACS | Department of Housing and Urban Development (HUD) | 2014-2018 | Comprehensive housing affordability strategy dataset | 50 States + DC + PR | Census 2010 | Early Summer 2023 | Blair Russell, Office of Policy Development and Research; HUD (Blair.D.Russell@hud.gov) | https://www.huduser.gov/portal/datasets/cp.html#2006-2019_data |
|
||||
| **Housing** | Lead paint | Share of homes that are likely to have lead paint | Share of homes built before 1960, which indicates potential lead paint exposure. Tracts with extremely high home values (i.e. median home values above the 90th percentile) that are less likely to face health risks from lead paint exposure are not included. | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Legacy Pollution** | Abandoned Mine Land | Presence of one or more abandoned min land within the tract | | Department of the Interior, Office of Surface Mining Reclamation and Enforcement | 2017 | Abandoned Mine Land Inventory System | 50 States + DC | Point Data | Yes, points can be mapped to any geography | | https://www.osmre.gov/programs/e-amlis |
|
||||
| **Legacy Pollution** | Formerly used Defense Site | Presence of one or more formerly used defense site within the tract | | US Army Corps of Engineers | 2019 | Formerly Used Defense Sites | 50 States + DC | Point Data | Yes, points can be mapped to any geography | | https://www.usace.army.mil/Missions/Environmental/Formerly-Used-Defense-Sites/ |
|
||||
| **Legacy Pollution** | Proximity to Hazardous Waste Facilities | count of hazardous waste facilities within 5 km | | EPA | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Legacy Pollution** | Proximity to Risk Management Plan Facilities | count of risk management plan facilities within 5 kilometers | | EPA | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Legacy Pollution** | Proximity to Superfund Sites | count of proposed or listed superfund or national priorities list sites within 5 km | | EPA | 2021 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Transportation** | Diesel particulate matter exposure | amount of diesel exhaust in the air | | EPA | 2014 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Transportation** | transportation barriers | average of relative cost and time spent on transportation | | DOT | 2022 | Transportation Access Disadvantage | 50 States + DC | Census 2020 | Yes | | https://www.transportation.gov/equity-Justice40#:~:text=Transportation%20access%20disadvantage%20identifies%20communities%20and%20places%20that%20spend%20more%2C%20and%20take%20longer%2C%20to%20get%20where%20they%20need%20to%20go.%20(4) |
|
||||
| **Transportation** | traffic proximity and volume | count of vehicles at major roads within 500 meters | | DOT (via EPA) | 2017 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Water and Wastewater** | underground storage tanks and releases | formula of the density of leaking underground storage tanks and number of all active underground storage tanks within 1500 feet of the census tract boundaries | | EPA /UST Finder | 2021 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Water and Wastewater** | wastewater discharge | modeled toxic concentrations at parts of streams within 500 meters | | EPA Risk Screening Environmental Indicators | 2020 | EJ Screen | 50 states, DC + Islands | Census 2010 & Census 2020 | Yes | | https://gaftp.epa.gov/ejscreen/ |
|
||||
| **Workforce Development** | Linguistic isolation | Share of households where no one over age 14 speaks English very well | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | low median income | comparison of median income in the tract to the median incomes in the area | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | poverty | share of people in households where income is at or below 100% of the federal poverty level | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | Unemployment | number of unemployed people as a part of the labor force | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Workforce Development** | High school Education | Percent of people ages 25 or older whose high school education is less than a high school diploma | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
| **Multiple Factors** | Low income | People in households where income is less than or equal to twice the federal poverty level, not including students enrolled in higher ed | | US Census | 2015-2019 | American Community Survey | 50 States + DC.+ PR | Census 2010, Census 2020 | Yes, can update to ACS 2017-2021 | |
|
||||
|
|
116
INSTALLATION.md
116
INSTALLATION.md
|
@ -2,85 +2,111 @@
|
|||
|
||||
_[¡Lea esto en español!](INSTALLATION-es.md)_
|
||||
|
||||
This page documents the installation steps for some of the software needed to work with this project.
|
||||
This page documents the installation steps for some of the prerequisite software needed to work with this project. It covers steps for macOS and Win10. If you are not on either of those platforms, install the software using steps appropriate for your operating system and device.
|
||||
|
||||
> **NOTE: If all you want to do is quickly run everything locally to try out the application, go straight to [`QUICKSTART.md`](QUICKSTART.md).**
|
||||
> :bulb: **NOTE**
|
||||
> If all you want to do is try out the application locally, visit [`QUICKSTART.md`](QUICKSTART.md).
|
||||
|
||||
After the generic installation steps on this page, continue to one of the following, depending on what you are trying to do:
|
||||
## Prerequisites
|
||||
|
||||
- If you are working with the frontend, see [`client/README.md`](client/README.md).
|
||||
- If you are working with the data pipeline, see [`data/data-pipeline/README.md`](data/data-pipeline/README.md).
|
||||
- If you want to understand the current deployment process, see [`.github/workflows/README.md`](.github/workflows/README.md).
|
||||
There are several prerequisites necessary for downloading the source code and creating the environment necessary to run both the Frontend Client and the Data Pipeline and Scoring Application.
|
||||
|
||||
## Install Git
|
||||
### 1. Install Git
|
||||
|
||||
### MacOS
|
||||
1. Open the terminal and type `git` and hit RETURN.
|
||||
2. If dev tools are not installed a window will prompt you to install dev tools.
|
||||
3. Open the terminal and type `git --version` and hit RETURN.
|
||||
4. Validate that a version number is returned. If so, git is properly installed.
|
||||
Our project is hosted on Github, and can be forked using Git. You can use Git via the command line or any number of first or third party visual clients (the scope of which is beyond these instructions).
|
||||
|
||||
### Win10
|
||||
Download from [website](https://git-scm.com/download/win)
|
||||
#### macOS :apple:
|
||||
|
||||
## Install Homebrew (MacOS only)
|
||||
To install Git on macOS,
|
||||
|
||||
Homebrew is an easy way to manage software downloads on MacOS. You don't _have_ to use it, but we recommend it.
|
||||
1. Open the terminal, type `git`, and press return
|
||||
2. If dev tools are not installed, a window will prompt you to install dev tools. Follow those instructions to complete the installation
|
||||
3. Once dev tools are installed, open the terminal, type `git --version`, and press return
|
||||
4. Validate that a version number is returned (e.g. `git version 2.37.1`). If a version number is returned, Git is properly installed
|
||||
|
||||
1. First, open your terminal and run `brew -v` to determine whether you have Homebrew installed. If you get a response that looks something like `Homebrew 3.1.9`, you've already got it! If you get nothing back, or an error, continue.
|
||||
2. Open the terminal and copy/paste this command and hit RETURN. Go through the prompts (you will need to grant `sudo` access).
|
||||
#### Win10 :window:
|
||||
|
||||
`/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"`
|
||||
On Win10, download and install Git following the instructions on [git-scm.com](https://git-scm.com/download/win).
|
||||
|
||||
2. Validate installation by typing `brew -v` in the terminal and ensure a version number is shown.
|
||||
---
|
||||
|
||||
You should regularly run `brew update` and `brew doctor` to make sure your packages are up to date and in good condition.
|
||||
### 2. Install Homebrew (macOS :apple: only)
|
||||
|
||||
### Install Node using NVM
|
||||
Homebrew is an easy way to manage software packages on macOS. Homebrew is _not_ a requirement. However, we recommend it, and our installation instructions will assume you have Homebrew installed.
|
||||
|
||||
This will work for both MacOS and Win10. Follow instructions on this [link](https://medium.com/@nodesource/installing-node-js-tutorial-using-nvm-5c6ff5925dd8). Be sure to read through the whole doc to find the sections within each step relevant to you (e.g. if you're using Homebrew, when you get to Step 2 look for the section, "Install NVM with Homebrew").
|
||||
1. Open your terminal and run `brew -v` to determine whether you have Homebrew installed. If you get a response that looks something like `Homebrew 3.1.9`, you've already got it! If you get nothing back, or an error, continue with these instructions.
|
||||
2. Follow [the instructions on the Homebrew home page](https://brew.sh) to install Homebrew on your machine.
|
||||
3. Validate installation by typing `brew -v` in the terminal; ensure a version number – like in step 1 – is shown.
|
||||
|
||||
If you install NVM using Homebrew, make sure to read the output in terminal after you run `brew install nvm`. You will need to add a few lines to your ~/.bash_profile and perhaps complete a couple other tasks.
|
||||
Don't forget to regularly run `brew update` and `brew doctor` to make sure your packages are up to date and in good condition.
|
||||
|
||||
Once you install NVM, don't forget to install Node! This is included in the linked tutorial above.
|
||||
---
|
||||
|
||||
After you've downloaded the nvm and the latest node (using the above steps) also install node version 14 by:
|
||||
### 3. Install Node
|
||||
|
||||
`brew install node@14`
|
||||
Node version manager (nvm) allows you to install, manage, and use different Node.js versions on your machine. It's our preferred method to install Node.js.
|
||||
|
||||
You should then be able to switch to that version of node by:
|
||||
Follow [these instructions](https://medium.com/@nodesource/installing-node-js-tutorial-using-nvm-5c6ff5925dd8) to install nvm. Be sure to read through all of the instructions to find the sections within each step relevant to you (e.g. if you're using Homebrew, when you get to Step 2 look for the section titled _Install NVM with Homebrew_).
|
||||
|
||||
> :bulb: **NOTE**
|
||||
> If you install nvm using Homebrew, make sure to read the terminal output. There are additional installation instructions you must follow (such as adding lines to your bash or zsh profile).
|
||||
|
||||
Once you've completed the nvm installation, use nvm to install Node.js version 14.
|
||||
|
||||
`nvm install 14`
|
||||
|
||||
You should then be able to switch to that version of Node.js through the command:
|
||||
|
||||
`nvm use 14`
|
||||
|
||||
To validate you are using node 14, type:
|
||||
To validate you are using Node.js 14, type:
|
||||
|
||||
`node -v`
|
||||
|
||||
This should return *Now using node 14.x.x (npm v6.x.x)*
|
||||
This should return something like `v14.x.x`.
|
||||
|
||||
## IDE set up (Optional)
|
||||
While any IDE can be used, we're outlining how to set up VS Code
|
||||
---
|
||||
|
||||
1. Open the terminal and type `brew install --cask visual-studio-code` and hit RETURN.
|
||||
1. If this doesn't work or for Win10, you can download VS Code from the [website](https://code.visualstudio.com/).
|
||||
2. After [forking this repo](https://github.com/usds/justice40-tool/blob/main/CONTRIBUTING.md#code-contributions), you can clone your forked repo into VS Code
|
||||
3. VS Code with prompt you to "clone a repo"
|
||||
4. Open the terminal and navigate to `client` directory
|
||||
5. Type `npm install` to load dependencies
|
||||
6. Type `gatsby develop` to spin up the app
|
||||
7. Navigate to `localhost:8000` to view the app
|
||||
### 4. Set Up Your IDE (Optional)
|
||||
|
||||
### Recommended VS Code Extensions
|
||||
While any IDE can be used to contribute to this project, many of our developers use Visual Studio Code (VS Code). Because of this, we've included a few [VS Code configurations to make it easier to develop the data pipeline](data/data-pipeline/INSTALLATION.md#visual-studio-code).
|
||||
|
||||
1. On macOS :apple:, open the terminal and type `brew install --cask visual-studio-code` and press return. If this doesn't work – or you're using Win10 :window: – you can [download and install VS Code](https://code.visualstudio.com/) directly.
|
||||
2. After [forking this repo](https://github.com/usds/justice40-tool/blob/main/CONTRIBUTING.md#code-contributions), you can clone your forked repo into VS Code.
|
||||
|
||||
<!-- TODO: this belongs in the client readme -->
|
||||
|
||||
To use the client in VS Code,
|
||||
|
||||
1. Open the terminal and navigate to `client` directory
|
||||
2. Type `npm install` to load dependencies
|
||||
3. Type `gatsby develop` to spin up the app
|
||||
4. Navigate to `localhost:8000` to view the app
|
||||
|
||||
We recommend the following VS Code Extensions:
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
|
||||
1. [Browser Preview](https://github.com/auchenberg/vscode-browser-preview)
|
||||
<!-- markdown-link-check-enable -->
|
||||
2. [Live Server](https://github.com/ritwickdey/vscode-live-server)
|
||||
3. [Live Share](https://github.com/MicrosoftDocs/live-share)
|
||||
4. [Live Share Audio](https://github.com/MicrosoftDocs/live-share)
|
||||
5. [Live Share Extention Pack](https://github.com/MicrosoftDocs/live-share)
|
||||
5. [Live Share Extension Pack](https://github.com/MicrosoftDocs/live-share)
|
||||
<!-- markdown-link-check-enable -->
|
||||
|
||||
## Install Docker
|
||||
---
|
||||
|
||||
Follow the [Docker installation
|
||||
### 5. Install Docker (Optional)
|
||||
|
||||
Using Docker is optional; the Frontend Client and the Data Pipeline and Scoring Application can be run without Docker. If you wish to install Docker, follow the [Docker installation
|
||||
instructions](https://docs.docker.com/get-docker/) for your platform.
|
||||
|
||||
## Next Steps
|
||||
|
||||
After you've completed the prerequisites, you can continue on to specific installation steps for the part of the platform you're interested in exploring.
|
||||
|
||||
| Platform | Instructions |
|
||||
| ------------------------------------- | ---------------------------------------------------------- |
|
||||
| Frontend Client | [Frontend Client Instructions](client/README.md) |
|
||||
| Data Pipeline and Scoring Application | [Data Pipeline Instructions](data/data-pipeline/README.md) |
|
||||
| Deployment Process | [Github Workflows README](.github/workflows/README.md) |
|
||||
|
|
|
@ -1,23 +1,28 @@
|
|||
# Mantenedores del proyecto Justice40
|
||||
|
||||
*[Read this in English!](MAINTAINERS.md)*
|
||||
_[Read this in English!](MAINTAINERS.md)_
|
||||
|
||||
## Dirección del proyecto
|
||||
- Lucas Brown - lucasmbrown-usds
|
||||
|
||||
- Kameron Kerger
|
||||
|
||||
## Ingeniería
|
||||
|
||||
Consultar la lista definitiva en [archivo CODEOWNERS](/.github/CODEOWNERS).
|
||||
|
||||
- Shelby Switzer - switzersc-usds - dirección de ingeniería
|
||||
- Jorge Escobar - esfoobar-usds
|
||||
- Tom Nielsen - TomNUSDS
|
||||
- Vim Shah - vim-usds
|
||||
- Vim Shah
|
||||
- Travis Newby
|
||||
- Sam Powers
|
||||
- Matt Bowen
|
||||
|
||||
## Producto
|
||||
- Katherine Mlika - katherinedm-usds
|
||||
- Beth Mattern - BethMattern
|
||||
|
||||
- Katherine Mlika
|
||||
|
||||
## Diseño
|
||||
- Kameron Kerger - KameronKerger
|
||||
- Jane Lien - janelien
|
||||
- Glennette Clark - glennettec
|
||||
|
||||
- Katrina Langer
|
||||
|
||||
# Liderazgo de Proyectos Emérito
|
||||
|
||||
- Lucas Brown
|
||||
|
|
|
@ -1,23 +1,28 @@
|
|||
# Justice40 Project Maintainers
|
||||
|
||||
*[¡Lea esto en español!](MAINTAINERS-es.md)*
|
||||
_[¡Lea esto en español!](MAINTAINERS-es.md)_
|
||||
|
||||
## Project Leadership
|
||||
- Lucas Brown - lucasmbrown-usds
|
||||
|
||||
- Kameron Kerger
|
||||
|
||||
## Engineering
|
||||
See [CODEOWNERS file](/.github/CODEOWNERS) for definitive list.
|
||||
|
||||
- Shelby Switzer - switzersc-usds - engineering lead
|
||||
- Jorge Escobar - esfoobar-usds
|
||||
- Tom Nielsen - TomNUSDS
|
||||
- Vim Shah - vim-usds
|
||||
See [CODEOWNERS](/.github/CODEOWNERS) for definitive list.
|
||||
|
||||
- Vim Shah
|
||||
- Travis Newby
|
||||
- Sam Powers
|
||||
- Matt Bowen
|
||||
|
||||
## Product
|
||||
- Katherine Mlika - katherinedm-usds
|
||||
- Beth Mattern - BethMattern
|
||||
|
||||
- Katherine Mlika
|
||||
|
||||
## Design
|
||||
- Kameron Kerger - KameronKerger
|
||||
- Jane Lien - janelien
|
||||
- Glennette Clark - glennettec
|
||||
|
||||
- Katrina Langer
|
||||
|
||||
## Project Leadership Emeritus
|
||||
|
||||
- Lucas Brown
|
||||
|
|
|
@ -12,7 +12,7 @@ $ cd justice40-tool
|
|||
|
||||
Install [`docker`](https://docs.docker.com/get-docker/). See [Install Docker](INSTALLATION.md#install-docker).
|
||||
|
||||
> *Important*: To be able to run the entire application, you may need to increase the memory allocated for docker to at least 8096 MB. See [this post](https://stackoverflow.com/a/44533437) for more details.
|
||||
> _Important_: To be able to run the entire application, you may need to increase the memory allocated for docker to at least 8096 MB. See [this post](https://stackoverflow.com/a/44533437) for more details.
|
||||
|
||||
Use `docker-compose` to run the application:
|
||||
|
||||
|
@ -20,6 +20,6 @@ Use `docker-compose` to run the application:
|
|||
$ docker-compose up
|
||||
```
|
||||
|
||||
> Note: This may take a while, possibly even an hour or two, since it has to build the containers and then download and process all the data.
|
||||
> Note: This may take a while – possibly even an hour or two – since it has to build the containers and then download and process all the data.
|
||||
|
||||
After it initializes, you should be able to open the application in your browser at [http://localhost:8000](http://localhost:8000).
|
||||
|
|
23
README-es.md
23
README-es.md
|
@ -1,38 +1,45 @@
|
|||
# Herramienta Justice40
|
||||
|
||||
[](https://github.com/usds/justice40-tool/blob/main/LICENSE.md)
|
||||
|
||||
*[Read this in English!](README.md)*
|
||||
_[Read this in English!](README.md)_
|
||||
|
||||
Le damos la bienvenida a la comunidad de código abierto de Justice40. Este repositorio contiene el código, los procesos y la documentación que activa los datos y la tecnología de la Herramienta Justice40 para la Vigilancia del Clima y la Justicia Económica (CEJST, por sus siglas en inglés).
|
||||
|
||||
## Antecedentes
|
||||
En enero de 2021, una [orden ejecutiva](https://www.whitehouse.gov/briefing-room/presidential-actions/2021/01/27/executive-order-on-tackling-the-climate-crisis-at-home-and-abroad/) *(en inglés)* anunció la iniciativa Justice40 y la herramienta de vigilancia con el objetivo de presentar un producto mínimo viable (MVP, por sus siglas en inglés) de la herramienta para el 27 de julio de 2021. La herramienta incluirá mapas interactivos y la versión preliminar de un informe de evaluación para que, al asignar los beneficios de sus programas, las dependencias federales puedan dar prioridad a las comunidades más desatendidas y agobiadas.
|
||||
|
||||
En enero de 2021, una [orden ejecutiva](https://www.whitehouse.gov/briefing-room/presidential-actions/2021/01/27/executive-order-on-tackling-the-climate-crisis-at-home-and-abroad/) _(en inglés)_ anunció la iniciativa Justice40 y la herramienta de vigilancia con el objetivo de presentar un producto mínimo viable (MVP, por sus siglas en inglés) de la herramienta para el 27 de julio de 2021. La herramienta incluirá mapas interactivos y la versión preliminar de un informe de evaluación para que, al asignar los beneficios de sus programas, las dependencias federales puedan dar prioridad a las comunidades más desatendidas y agobiadas.
|
||||
|
||||
## Equipo central
|
||||
|
||||
El equipo central de Justice40 a cargo de la creación de esta herramienta consiste en un grupo pequeño de diseñadores, desarrolladores y coordinadores de productos del Servicio Digital de los EE. UU. en asociación con el Consejo de Calidad Ambiental (CEQ, por sus siglas en inglés).
|
||||
|
||||
En [MAINTAINERS-es.md](MAINTAINERS-es.md), se publica la lista actualizada de los integrantes del equipo central. La lista de los ingenieros del equipo central a cargo del mantenimiento del código en este repositorio se puede consultar en [.github/CODEOWNERS](.github/CODEOWNERS) *(en inglés)*.
|
||||
En [MAINTAINERS-es.md](MAINTAINERS-es.md), se publica la lista actualizada de los integrantes del equipo central. La lista de los ingenieros del equipo central a cargo del mantenimiento del código en este repositorio se puede consultar en [.github/CODEOWNERS](.github/CODEOWNERS) _(en inglés)_.
|
||||
|
||||
## Comunidad
|
||||
|
||||
En la realización de esta herramienta, el método de código abierto que usa el equipo de Justice40 se dirige primeramente a la comunidad. Consideramos que el software del gobierno se debería dar a conocer, y que se debe crear y autorizar de manera que cualquier persona pueda acceder al código, ejecutarlo sin necesidad de pagar a terceros ni de emplear software propietario, y usarlo como desee.
|
||||
|
||||
Sabemos que podemos aprender de comunidades muy distintas (incluidas las que usarán la herramienta o se verán afectadas por esta) que tienen gran experiencia en la ciencia o la tecnología de datos, o en el trabajo en favor del clima y de la justicia económica o ambiental. Nos dedicamos a crear foros para facilitar la conversación y los comentarios continuos con objeto de conformar el diseño y el desarrollo de la herramienta.
|
||||
|
||||
Asimismo, reconocemos que la generación de capacidad es indispensable para incluir a una comunidad diversa de código abierto. Estamos procurando usar lenguaje accesible, proporcionar documentación técnica y sobre procesos en varios idiomas, y ofrecer asistencia, sea directamente o en forma de chats de grupo y de capacitación, a los miembros de nuestra comunidad que poseen gran variedad de conocimientos y habilidades. Si tiene ideas acerca de cómo podemos mejorar o ampliar nuestro trabajo para generar capacidad y de métodos para atraer a los usuarios a nuestra comunidad, comuníquese con nosotros en el [Grupo de Google](https://groups.google.com/u/4/g/justice40-open-source) *(en inglés)* o por correo electrónico en justice40open@usds.gov.
|
||||
Asimismo, reconocemos que la generación de capacidad es indispensable para incluir a una comunidad diversa de código abierto. Estamos procurando usar lenguaje accesible, proporcionar documentación técnica y sobre procesos en varios idiomas, y ofrecer asistencia, sea directamente o en forma de chats de grupo y de capacitación, a los miembros de nuestra comunidad que poseen gran variedad de conocimientos y habilidades. Si tiene ideas acerca de cómo podemos mejorar o ampliar nuestro trabajo para generar capacidad y de métodos para atraer a los usuarios a nuestra comunidad, comuníquese con nosotros en el [Grupo de Google](https://groups.google.com/u/4/g/justice40-open-source) _(en inglés)_ o por correo electrónico en justice40open@usds.gov.
|
||||
|
||||
### Chats de la comunidad
|
||||
Cada dos semanas, llevamos a cabo sesiones de chat para la comunidad de código abierto los lunes de 5 a 6 p. m. hora del Este. Nuestro [Grupo de Google](https://groups.google.com/u/4/g/justice40-open-source) *(en inglés)*, proporciona información acerca de los temas que se tratarán en esas sesiones y de cómo puede participar en estas.
|
||||
|
||||
Cada dos semanas, llevamos a cabo sesiones de chat para la comunidad de código abierto los lunes de 5 a 6 p. m. hora del Este. Nuestro [Grupo de Google](https://groups.google.com/u/4/g/justice40-open-source) _(en inglés)_, proporciona información acerca de los temas que se tratarán en esas sesiones y de cómo puede participar en estas.
|
||||
|
||||
Se invita a los miembros de la comunidad a informar de novedades o a proponer temas de conversación para los chats de la comunidad, por medio del Grupo de Google.
|
||||
|
||||
### Grupo de Google
|
||||
Nuestro [Grupo de Google](https://groups.google.com/u/4/g/justice40-open-source) *(en inglés)* está abierto a quienes deseen participar y comunicar ahí sus conocimientos o experiencias, y hacer preguntas al equipo central de Justice40 o a la comunidad en conjunto.
|
||||
|
||||
Nuestro [Grupo de Google](https://groups.google.com/u/4/g/justice40-open-source) _(en inglés)_ está abierto a quienes deseen participar y comunicar ahí sus conocimientos o experiencias, y hacer preguntas al equipo central de Justice40 o a la comunidad en conjunto.
|
||||
|
||||
El equipo central usa el grupo para publicar la información más reciente sobre el programa y problemas técnicos y de datos. También comunica ahí la agenda y solicita la participación de la comunidad en el chat.
|
||||
|
||||
¿Tiene una pregunta y no sabe si debe plantearla aquí como un problema de GitHub o en el Grupo de Google? La regla general es que los problemas son para los temas que requieren acciones o procesos relacionados con la herramienta o los datos (por ejemplo, preguntas acerca de conjuntos específicos de datos en uso, o sugerencias para una nueva función de la herramienta), mientras que el Grupo de Google es para temas o preguntas más generales. Si está indeciso, use el Grupo de Google y hablaremos ahí antes de pasar su pregunta a GitHub, si corresponde.
|
||||
|
||||
## Colaboraciones
|
||||
|
||||
Las colaboraciones son siempre bien recibidas. Nos agradan las aportaciones en forma de conversación sobre los temas de este repositorio y las solicitudes para incorporación de cambios en la documentación y el código.
|
||||
En [CONTRIBUTING-es.md](CONTRIBUTING-es.md), consulte la manera de empezar a participar.
|
||||
|
||||
|
@ -41,7 +48,9 @@ En [CONTRIBUTING-es.md](CONTRIBUTING-es.md), consulte la manera de empezar a par
|
|||
La instalación es una instalación típica de gatsby y los detalles se pueden encontrar en [INSTALLATION-es.md](INSTALLATION-es.md)
|
||||
|
||||
## Glosario
|
||||
¿Tiene duda acerca de un término? ¿Encontró un acrónimo y no sabe a qué se refiere? Consulte [nuestro glosario](docs/glossary.md) *(en inglés)*.
|
||||
|
||||
¿Tiene duda acerca de un término? ¿Encontró un acrónimo y no sabe a qué se refiere? Consulte [nuestro glosario](docs/glossary.md) _(en inglés)_.
|
||||
|
||||
## Comentarios
|
||||
|
||||
Si tiene preguntas o comentarios, escríbanos a justice40open@usds.gov.
|
||||
|
|
45
README.md
45
README.md
|
@ -1,65 +1,72 @@
|
|||
# Justice40 Tool
|
||||
|
||||
[](https://github.com/usds/justice40-tool/blob/main/LICENSE.md)
|
||||
|
||||
*[¡Lea esto en español!](README-es.md)*
|
||||
_[¡Lea esto en español!](README-es.md)_
|
||||
|
||||
Welcome to the Justice40 Open Source Community! This repo contains the code, processes, and documentation for the data and tech powering the Justice40 [Climate and Economic Justice Screening Tool (CEJST)](https://screeningtool.geoplatform.gov).
|
||||
|
||||
## Background
|
||||
The Justice40 initiative and screening tool were announced in an [Executive Order](https://www.whitehouse.gov/briefing-room/presidential-actions/2021/01/27/executive-order-on-tackling-the-climate-crisis-at-home-and-abroad/) in January 2021. This tool will include interactive maps and an initial draft scorecard which federal agencies can use to prioritize historically overburdened and underserved communities for benefits in their programs.
|
||||
|
||||
Please see our [Open Source Community Orientation](docs/Justice40_Open_Source_Community_Orientation.pptx) deck for more information on the Justice40 initiative, our team, this project, and ways to participate.
|
||||
The Justice40 initiative and CEJST were announced in the [Executive Order on Tackling the Climate Crisis at Home and Abroad](https://www.whitehouse.gov/briefing-room/presidential-actions/2021/01/27/executive-order-on-tackling-the-climate-crisis-at-home-and-abroad/) in January 2021. The CEJST includes interactive maps and an initial draft scorecard which federal agencies can use to prioritize historically overburdened and underserved communities for benefits in their programs.
|
||||
|
||||
## Core team
|
||||
The core Justice40 team building this tool is a small group of designers, developers, and product managers from the US Digital Service in partnership with the Council on Environmental Quality (CEQ).
|
||||
Please visit our [Open Source Community Orientation](docs/Justice40_Open_Source_Community_Orientation.pptx) deck for more information on the Justice40 initiative, our team, this project, and ways to participate.
|
||||
|
||||
## Core Team
|
||||
|
||||
The core Justice40 team building this tool is a small group of designers, developers, and product managers from the [US Digital Service](https://www.usds.gov) in partnership with the [Council on Environmental Quality (CEQ)](https://www.whitehouse.gov/ceq/).
|
||||
|
||||
An up-to-date list of core team members can be found in [MAINTAINERS.md](MAINTAINERS.md). The engineering members of the core team who maintain the code in this repo are listed in [.github/CODEOWNERS](.github/CODEOWNERS).
|
||||
|
||||
## Community
|
||||
The Justice40 team is taking a community-first and open source approach to the product development of this tool. We believe government software should be made in the open and be built and licensed such that anyone can take the code, run it themselves without paying money to third parties or using proprietary software, and use it as they will.
|
||||
|
||||
We know that we can learn from a wide variety of communities, including those who will use or will be impacted by the tool, who are experts in data science or technology, or who have experience in climate, economic,or environmental justice work. We are dedicated to creating forums for continuous conversation and feedback to help shape the design and development of the tool.
|
||||
The Justice40 team is taking a community-first and open source approach to the product development of this tool. We believe government software should be made in the open and be built and licensed such that anyone can download the code, run it themselves without paying money to third parties or using proprietary software, and use it as they will.
|
||||
|
||||
We also recognize capacity building as a key part of involving a diverse open source community. We are doing our best to use accessible language, provide technical and process documents in multiple languages, and offer support to our community members of a wide variety of backgrounds and skillsets, directly or in the form of group chats and training. If you have ideas for how we can improve or add to our capacity building efforts and methods for welcoming folks into our community, please let us know in the [Google Group](https://groups.google.com/u/4/g/justice40-open-source) or email us at justice40open@usds.gov.
|
||||
We know that we can learn from a wide variety of communities, including those who will use or will be impacted by the tool, who are experts in data science or technology, or who have experience in climate, economic, or environmental justice work. We are dedicated to creating forums for continuous conversation and feedback to help shape the design and development of the tool.
|
||||
|
||||
We also recognize capacity building as a key part of involving a diverse open source community. We are doing our best to use accessible language, provide technical and process documents in multiple languages, and offer support – both directly and in the form of group chats and training – to community members with a wide variety of backgrounds and skillsets. If you have ideas for how we can improve or add to our capacity building efforts and methods for welcoming people into our community, please let us know in the [Google Group](https://groups.google.com/u/4/g/justice40-open-source) or email us at justice40open@usds.gov.
|
||||
|
||||
### Community Guidelines
|
||||
|
||||
Principles and guidelines for participating in our open source community are available [here](COMMUNITY_GUIDELINES.md). Please read them before joining or starting a conversation in this repo or one of the channels listed below.
|
||||
|
||||
### Community Chats
|
||||
We host open source community chats every third Monday of the month at 5-6pm ET. You can find information about the agenda and how to participate in our [Google Group](https://groups.google.com/u/4/g/justice40-open-source).
|
||||
|
||||
Community members are welcome to share updates or propose topics for discussion in community chats. Please do so in the Google Group.
|
||||
|
||||
### Google Group
|
||||
|
||||
Our [Google Group](https://groups.google.com/u/4/g/justice40-open-source) is open to anyone to join and share their knowledge or experiences, as well as to ask questions of the core Justice40 team or the wider community.
|
||||
|
||||
The core team uses the group to post updates on the program and tech/data issues, and to share the agenda and call for community participation in the community chat.
|
||||
|
||||
Curious about whether to ask a question here as a Github issue or in the Google Group? The general rule of thumb is that issues are for actionable topics related to the tool or data itself (e.g. questions about a specific data set in use, or suggestion for a new tool feature), and the Google Group is for more general topics or questions. If you can't decide, use the google group and we'll discuss it there before moving to Github if appropriate!
|
||||
Curious about whether to ask a question here as a Github issue or in the Google Group? Github issues are for actionable topics related to the tool or data itself (e.g. questions about a specific data set in use, or suggestion for a new tool feature), and the Google Group is for general topics or questions. If you can't decide where your question fits, use the Google Group and we'll discuss it there before moving to Github if appropriate!
|
||||
|
||||
### Open Source Community Chats
|
||||
|
||||
We host open source community chats every third Monday of the month at 5-6pm ET. You can find information about the agenda and how to participate in our [Google Group](https://groups.google.com/u/4/g/justice40-open-source).
|
||||
|
||||
Community members are welcome to use our Google Group to share updates or propose topics for discussion in community chats.
|
||||
|
||||
## Contributing
|
||||
|
||||
Contributions are always welcome! We encourage contributions in the form of discussion on issues in this repo and pull requests of documentation and code.
|
||||
|
||||
See [CONTRIBUTING.md](CONTRIBUTING.md) for ways to get started.
|
||||
Visit [CONTRIBUTING.md](CONTRIBUTING.md) for ways to get started.
|
||||
|
||||
## For Developers and Data Scientists
|
||||
|
||||
### Datasets
|
||||
|
||||
The intermediate steps of the data pipeline and the final output that is consumed by the frontend are all public and can be accessed directly. See [DATASETS.md](DATASETS.md) for these direct download links.
|
||||
The intermediate steps of the data pipeline, the scores, and the final output that is consumed by the frontend are all public and can be accessed directly. Visit [DATASETS.md](DATASETS.md) for these direct download links.
|
||||
|
||||
### Local Quickstart
|
||||
|
||||
If you want to run the entire application locally, see [QUICKSTART.md](QUICKSTART.md).
|
||||
If you want to run the entire application locally, visit [QUICKSTART.md](QUICKSTART.md).
|
||||
|
||||
### Advanced Guides
|
||||
|
||||
If you have software experience or more specific use cases, start at [INSTALLATION.md](INSTALLATION.md) for more in-depth documentation of how to work with this project.
|
||||
If you have software experience or more specific use cases, in-depth documentation of how to work with this project can be found in [INSTALLATION.md](INSTALLATION.md).
|
||||
|
||||
### Project Documentation
|
||||
|
||||
For more general documentation on the project that is not related to getting set up, including architecture diagrams and engineering decision logs, see [docs/](docs/).
|
||||
For more general documentation on the project that is not related to getting set up, including architecture diagrams and engineering decision logs, visit [docs/](docs/).
|
||||
|
||||
## Glossary
|
||||
|
||||
|
|
135
data/data-pipeline/INSTALLATION.md
Normal file
135
data/data-pipeline/INSTALLATION.md
Normal file
|
@ -0,0 +1,135 @@
|
|||
# Justice40 Data Pipeline and Scoring Application Installation Guide
|
||||
|
||||
This page documents the local environment setup steps for the Justice40 Data Pipeline and Scoring Application. It covers steps for macOS and Win10. If you are not on either of those platforms, install the software using instructions appropriate for your operating system and device.
|
||||
|
||||
> :warning: **WARNING**
|
||||
> This guide assumes you've performed all prerequisite steps listed in the [main installation guide](/INSTALLATION.md). If you've not performed those steps, now is a good time.
|
||||
|
||||
> :bulb: **NOTE**
|
||||
> If you've not yet read the [project README](/README.md) or the [data pipeline and scoring application README](README.md) to familiarize yourself with the project, it would be useful to do so before continuing with this installation guide.
|
||||
|
||||
## Installation
|
||||
|
||||
The Justice40 Data Pipeline and Scoring Application is written in Python. It can be run using Poetry after installing a few third party tools.
|
||||
|
||||
### 1. Install Python
|
||||
|
||||
The application is written in Python, and requires the installation of Python 3.8 or newer (we recommend 3.10).
|
||||
|
||||
#### macOS :apple:
|
||||
|
||||
There are many ways to install Python on macOS, and you can choose any of those ways that work for your configuration.
|
||||
|
||||
One such way is by using [`pyenv`](https://github.com/pyenv/pyenv). `pyenv` allows you to manage multiple Python versions on the same device. To install `pyenv` on your system, follow [these instructions](https://github.com/pyenv/pyenv#installation). Be sure to follow any post-installation steps listed by Homebrew, as well as any extra steps listed in the installation instructions.
|
||||
|
||||
Once `pyenv` is installed, you can use it to install Python. Execute the command `pyenv install 3.10.6` to install Python 3.10. After installing Python, navigate to the `justice40-tool` directory and set this Python to be your default by issuing the command `pyenv local 3.10.6`. Run the command `python --version` to make sure this worked.
|
||||
|
||||
> :warning: **WARNING**
|
||||
> We've had trouble with 3rd party dependencies in Python 3.11 on macOS machines with Apple silicon. In case of odd dependency issues, please use Python 3.10.
|
||||
|
||||
#### Win10 :window:
|
||||
|
||||
Follow the Get Started guide on [python.org](https://www.python.org/) to download and install Python on your Windows system. Alternately, if you wish to manage your Python installations more carefully, you can use [`pyenv-win`](https://github.com/pyenv-win/pyenv-win).
|
||||
|
||||
---
|
||||
|
||||
### 2. Install Poetry
|
||||
|
||||
The Justice40 Data Pipeline and Scoring Application uses [Poetry](https://python-poetry.org/) to manage Python dependencies. Those dependencies are defined in [pyproject.toml](pyproject.toml), and exact versions of all dependencies can be found in [poetry.lock](poetry.lock).
|
||||
|
||||
Once Poetry is installed, you can download project dependencies by navigating to `justice40-tool/data/data-pipeline` and running `poetry install`.
|
||||
|
||||
> :warning: **WARNING**
|
||||
> While it may be tempting to run `poetry update`, this project is built with older versions of some dependencies. Updating all dependencies will likely cause the application to behave in unexpected ways, and may cause the application to crash.
|
||||
|
||||
#### macOS :apple:
|
||||
|
||||
To install Poetry on macOS, follow the [installation instructions](https://python-poetry.org/docs/#installation) on the Poetry site. There are multiple ways to install Poetry; we prefer installing and managing it through [`pipx`](https://pypa.github.io/pipx/installation/) (requires `pipx` installation), but feel free to use whatever works for your configuration.
|
||||
|
||||
#### Win10 :window:
|
||||
|
||||
To install Poetry on Win10, follow the [installation instructions](https://python-poetry.org/docs/#installation) on the Poetry site.
|
||||
|
||||
---
|
||||
|
||||
### 3. Install the 3rd Party Tools
|
||||
|
||||
The application requires the installation of three 3rd party tools.
|
||||
|
||||
| Tool | Purpose | Link |
|
||||
| --------------- | -------------------- | --------------------------------------------------------- |
|
||||
| GDAL | Generate census data | [GDAL library](https://github.com/OSGeo/gdal) |
|
||||
| libspatialindex | Score generation | [libspatialindex](https://libspatialindex.org/en/latest/) |
|
||||
| tippecanoe | Generate map tiles | [Mapbox tippecanoe](https://github.com/mapbox/tippecanoe) |
|
||||
|
||||
#### macOS :apple:
|
||||
|
||||
Use Homebrew to install the three tools.
|
||||
|
||||
- GDAL: `brew install gdal`
|
||||
- libspatialindex: `brew install spatialindex`
|
||||
- tippecanoe: `brew install tippecanoe`
|
||||
|
||||
> :exclamation: **ATTENTION**
|
||||
> For macOS Monterey or Macs with Apple silicon, you may need to follow [these steps](https://stackoverflow.com/a/70880741) to install Scipy.
|
||||
|
||||
#### Win10 :window:
|
||||
|
||||
If you want to run tile generation, please install tippecanoe [following these instructions](https://github.com/GISupportICRC/ArcGIS2Mapbox#installing-tippecanoe-on-windows). You also need some pre-requisites for Geopandas (as specified in the Poetry requirements). Please follow [these instructions](https://stackoverflow.com/questions/56958421/pip-install-geopandas-on-windows) to install the Geopandas dependency locally. It's definitely easier if you have access to WSL (Windows Subsystem Linux), and install these packages using commands similar to our [Dockerfile](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/Dockerfile).
|
||||
|
||||
---
|
||||
|
||||
### 4. Install Pre-Commit Hooks
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
|
||||
To promote consistent code style and quality, we use Git [pre-commit](https://pre-commit.com) hooks to automatically lint and reformat our code before every commit. This project's pre-commit hooks are defined in [`.pre-commit-config.yaml`](../.pre-commit-config.yaml).
|
||||
|
||||
After following the installation instructions for your platform, navigate to the `justice40-tool/data/data-pipeline` directory and run `pre-commit install` to install the pre-commit hooks used in this repository.
|
||||
|
||||
After installing pre-commit hooks, any time you commit code to the repository the hooks will run on all modified files automatically. You can force a re-run on all files with `pre-commit run --all-files`.
|
||||
|
||||
<!-- markdown-link-check-enable -->
|
||||
|
||||
#### macOS :apple:
|
||||
|
||||
Follow [the Homebrew installation instructions on the pre-commit website](https://pre-commit.com/#install) to install pre-commit on macOS.
|
||||
|
||||
#### Win10 :window:
|
||||
|
||||
Follow [the instructions on the pre-commit website](https://pre-commit.com/#install) to install pre-commit on Win10.
|
||||
|
||||
#### Conflicts between backend and frontend Git hooks
|
||||
|
||||
In the client part of the codebase (the `justice40-tool/client` folder), we use a different tool, `Husky`, to run pre-commit hooks. It is not possible to run both our `Husky` hooks and `pre-commit` hooks on every commit; either one or the other will run.
|
||||
|
||||
`Husky` is installed every time you run `npm install`. To use the `Husky` front-end hooks during front-end development, simply run `npm install`.
|
||||
|
||||
However, running `npm install` overwrites the backend hooks setup by `pre-commit`. To restore the backend hooks after running `npm install`, do the following:
|
||||
|
||||
1. Run `pre-commit install` while in the `justice40-tool/data/data-pipeline` directory.
|
||||
2. The terminal should respond with an error message such as:
|
||||
|
||||
```
|
||||
[ERROR] Cowardly refusing to install hooks with `core.hooksPath` set.
|
||||
hint: `git config --unset-all core.hooksPath`
|
||||
```
|
||||
|
||||
This error is caused by having previously run `npm install` which used `Husky` to overwrite the hooks path.
|
||||
|
||||
3. Follow the hint and run `git config --unset-all core.hooksPath`.
|
||||
4. Run `pre-commit install` again.
|
||||
|
||||
Now `pre-commit` and the backend hooks should work.
|
||||
|
||||
## Visual Studio Code
|
||||
|
||||
If you are using VS Code, you can make use of the `.vscode` configurations located at `data/data-pipeline/.vscode`. To do this, open VS Code with the command `code data/data-pipeline`.
|
||||
|
||||
These configurations include,
|
||||
|
||||
1. `launch.json` - launch commands that allow for debugging the various commands in `application.py`. Note that because we are using the otherwise excellent [Click CLI](https://click.palletsprojects.com/en/8.0.x/), and Click in turn uses `console_scripts` to parse and execute command line options, it is necessary to run the equivalent of `python -m data_pipeline.application [command]` within `launch.json` to be able to set and hit breakpoints (this is what is currently implemented. Otherwise, you may find that the script times out after 5 seconds. More about this [here](https://stackoverflow.com/questions/64556874/how-can-i-debug-python-console-script-command-line-apps-with-the-vscode-debugger).
|
||||
2. `settings.json` - these ensure that you're using the default linter (`pylint`), formatter (`flake8`), and test library (`pytest`).
|
||||
3. `tasks.json` - these enable you to use `Terminal → Run Task` to run our preferred formatters and linters within your project.
|
||||
|
||||
Please only add settings to this file that should be shared across the team (not settings here that only apply to local development environments, such as those that use absolute paths). If you are looking to add something to this file, check in with the rest of the team to ensure the proposed settings should be shared.
|
|
@ -1,199 +1,87 @@
|
|||
# Justice 40 Score application
|
||||
# Justice40 Data Pipeline and Scoring Application
|
||||
|
||||
<details open="open">
|
||||
<summary>Table of Contents</summary>
|
||||
## Table of Contents
|
||||
|
||||
<!-- TOC -->
|
||||
- [About](#about)
|
||||
- [Accessing Data](#accessing-data)
|
||||
- [Installing the Data Pipeline and Scoring Application](#installing-the-data-pipeline-and-scoring-application)
|
||||
- [Running the Data Pipeline and Scoring Application](#running-the-data-pipeline-and-scoring-application)
|
||||
- [How Scoring Works](#how-scoring-works)
|
||||
- [Comparing Scores](#comparing-scores)
|
||||
- [Testing](#testing)
|
||||
|
||||
- [Justice 40 Score application](#justice-40-score-application)
|
||||
- [About this application](#about-this-application)
|
||||
- [Using the data](#using-the-data)
|
||||
- [1. Source data](#1-source-data)
|
||||
- [2. Extract-Transform-Load (ETL) the data](#2-extract-transform-load-etl-the-data)
|
||||
- [3. Combined dataset](#3-combined-dataset)
|
||||
- [4. Tileset](#4-tileset)
|
||||
- [5. Shapefiles](#5-shapefiles)
|
||||
- [Score generation and comparison workflow](#score-generation-and-comparison-workflow)
|
||||
- [Workflow Diagram](#workflow-diagram)
|
||||
- [Step 0: Set up your environment](#step-0-set-up-your-environment)
|
||||
- [Step 1: Run the script to download census data or download from the Justice40 S3 URL](#step-1-run-the-script-to-download-census-data-or-download-from-the-justice40-s3-url)
|
||||
- [Step 2: Run the ETL script for each data source](#step-2-run-the-etl-script-for-each-data-source)
|
||||
- [Table of commands](#table-of-commands)
|
||||
- [ETL steps](#etl-steps)
|
||||
- [Step 3: Calculate the Justice40 score experiments](#step-3-calculate-the-justice40-score-experiments)
|
||||
- [Step 4: Compare the Justice40 score experiments to other indices](#step-4-compare-the-justice40-score-experiments-to-other-indices)
|
||||
- [Data Sources](#data-sources)
|
||||
- [Running using Docker](#running-using-docker)
|
||||
- [Local development](#local-development)
|
||||
- [VSCode](#vscode)
|
||||
- [MacOS](#macos)
|
||||
- [Windows Users](#windows-users)
|
||||
- [Setting up Poetry](#setting-up-poetry)
|
||||
- [Running tox](#running-tox)
|
||||
- [The Application entrypoint](#the-application-entrypoint)
|
||||
- [Downloading Census Block Groups GeoJSON and Generating CBG CSVs (not normally required)](#downloading-census-block-groups-geojson-and-generating-cbg-csvs-not-normally-required)
|
||||
- [Run all ETL, score and map generation processes](#run-all-etl-score-and-map-generation-processes)
|
||||
- [Run both ETL and score generation processes](#run-both-etl-and-score-generation-processes)
|
||||
- [Run all ETL processes](#run-all-etl-processes)
|
||||
- [Generating Map Tiles](#generating-map-tiles)
|
||||
- [Serve the map locally](#serve-the-map-locally)
|
||||
- [Running Jupyter notebooks](#running-jupyter-notebooks)
|
||||
- [Activating variable-enabled Markdown for Jupyter notebooks](#activating-variable-enabled-markdown-for-jupyter-notebooks)
|
||||
- [Testing](#testing)
|
||||
- [Background](#background)
|
||||
- [Score and post-processing tests](#score-and-post-processing-tests)
|
||||
- [Updating Pickles](#updating-pickles)
|
||||
- [Future Enhancements](#future-enhancements)
|
||||
- [Fixtures used in ETL "snapshot tests"](#fixtures-used-in-etl-snapshot-tests)
|
||||
- [Other ETL Unit Tests](#other-etl-unit-tests)
|
||||
- [Extract Tests](#extract-tests)
|
||||
- [Transform Tests](#transform-tests)
|
||||
- [Load Tests](#load-tests)
|
||||
- [Smoketests](#smoketests)
|
||||
## About
|
||||
|
||||
<!-- /TOC -->
|
||||
The Justice40 Data Pipeline and Scoring Application is used to retrieve input data sources, perform Extract-Transform-Load (ETL) operations on those data sources, and ultimately generate the scores and supporting data (e.g. map tiles) consumed by the [Climate and Economic Justice Screening Tool (CEJST) website](https://screeningtool.geoplatform.gov/). This data can also be used to compare experimental versions of the Justice40 score to established environmental justice indices, such as EJSCREEN and CalEnviroScreen.
|
||||
|
||||
</details>
|
||||
> :exclamation: **ATTENTION**
|
||||
> The Council on Environmental Quality (CEQ) [made version 1.0 of the CEJST available in November 2022](https://www.whitehouse.gov/ceq/news-updates/2022/11/22/biden-harris-administration-launches-version-1-0-of-climate-and-economic-justice-screening-tool-key-step-in-implementing-president-bidens-justice40-initiative/). Future versions are in continuous development, and scores are likely to change over time. Only versions made publicly available via the CEJST by CEQ may be used for the Justice40 Initiative.
|
||||
|
||||
## About this application
|
||||
We believe that the entire data pipeline should be open and replicable end-to-end. As part of this, in addition to all code being open, we also strive to make data visible and available for use at every stage of our pipeline. You can follow the installation instructions below to spin up the data pipeline yourself in your own environment; you can also access the data we've already processed.
|
||||
|
||||
This application is used to compare experimental versions of the Justice40 score to established environmental justice indices, such as EJSCREEN, CalEnviroScreen, and so on.
|
||||
## Accessing Data
|
||||
|
||||
_**NOTE:** These scores **do not** represent final versions of the Justice40 scores and are merely used for comparative purposes. As a result, the specific input columns and formulas used to calculate them are likely to change over time._
|
||||
If you wish to access our data without running the Justice40 Data Pipeline and Scoring Application locally, you can do so using the following links.
|
||||
|
||||
### Using the data
|
||||
| Dataset | Location |
|
||||
| ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Source Data | You can find the source URLs in the `etl.py` files located within each directory in `data/data-pipeline/etl/sources` |
|
||||
| Version 1.0 Combined Datasets (from all Sources) | [Download](https://static-data-screeningtool.geoplatform.gov/data-versions/1.0/data/score/csv/full/usa.csv) |
|
||||
| Shape Files for Mapping Applications | [Download](https://static-data-screeningtool.geoplatform.gov/data-versions/1.0/data/score/downloadable/1.0-shapefile-codebook.zip) |
|
||||
| Documentation and Other Downloads | [Climate and Economic Justice Screening Tool Downloads](https://screeningtool.geoplatform.gov/en/downloads) |
|
||||
|
||||
One of our primary development principles is that the entire data pipeline should be open and replicable end-to-end. As part of this, in addition to all code being open, we also strive to make data visible and available for use at every stage of our pipeline. You can follow the instructions below in this README to spin up the data pipeline yourself in your own environment; you can also access the data we've already processed on our S3 bucket.
|
||||
## Installing the Data Pipeline and Scoring Application
|
||||
|
||||
In the sub-sections below, we outline what each stage of the data provenance looks like and where you can find the data output by that stage. If you'd like to actually perform each step in your own environment, skip down to [Score generation and comparison workflow](#score-generation-and-comparison-workflow).
|
||||
If you wish to run the Justice40 Data Pipeline and Scoring Application in your own environment, you have the option of using Docker or setting up a local environment. Docker allows you to install and run the application inside a container without setting up a local environment, and is the quickest and easiest option. A local environment requires you to set up your system manually, but provides the ability to make changes and run individual parts of the application without the need for Docker.
|
||||
|
||||
#### 1. Source data
|
||||
With either choice, you'll first need to perform some installation steps.
|
||||
|
||||
If you would like to find and use the raw source data, you can find the source URLs in the `etl.py` files located within each directory in `data/data-pipeline/etl/sources`.
|
||||
### Installing Docker
|
||||
|
||||
#### 2. Extract-Transform-Load (ETL) the data
|
||||
To install Docker, follow these [instructions](https://docs.docker.com/get-docker/). After installation is complete, visit [Running with Docker](#running-with-docker) for more information.
|
||||
|
||||
The first step of processing we perform is a simple ETL process for each of the source datasets. Code is available in `data/data-pipeline/etl/sources`, and the output of this process is a number of CSVs available at the following locations:
|
||||
---
|
||||
|
||||
- EJScreen: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/ejscreen_2019/usa.csv>
|
||||
- Census ACS 2019: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/census_acs_2019/usa.csv>
|
||||
- Housing and Transportation Index: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/housing_and_transportation_index/usa.csv>
|
||||
- HUD Housing: <https://justice40-data.s3.amazonaws.com/data-pipeline/data/dataset/hud_housing/usa.csv>
|
||||
### Installing Your Local Environment
|
||||
|
||||
Each CSV may have a different column name for the census tract or census block group identifier. You can find what the name is in the ETL code. Please note that when you view these files you should make sure that your text editor or spreadsheet software does not remove the initial `0` from this identifier field (many IDs begin with `0`).
|
||||
The detailed steps for performing [local environment installation can be found in our guide](INSTALLATION.md). After installation is complete, visit [Running the Application Locally](#running-in-your-local-environment) for more information.
|
||||
|
||||
#### 3. Combined dataset
|
||||
## Running the Data Pipeline and Scoring Application
|
||||
|
||||
The CSV with the combined data from all of these sources [can be accessed here](https://justice40-data.s3.amazonaws.com/data-pipeline/data/score/csv/full/usa.csv).
|
||||
The Justice40 Data Pipeline and Scoring Application is a multistep process that,
|
||||
|
||||
#### 4. Tileset
|
||||
1. Retrieves input data sources (extract), standardizes those input data sources' data into an intermediate format (transform), and saves the results to the file system (load). It performs those steps for each configured input data source (found at [`data_pipeline/etl/sources`](data_pipeline/etl/sources))
|
||||
2. Calculates a score
|
||||
3. Combines the score with geographic data
|
||||
4. Generates map tiles for use in the client website
|
||||
|
||||
Once we have all the data from the previous stages, we convert it to tiles to make it usable on a map. We render the map on the client side which can be seen using `docker-compose up`.
|
||||
```mermaid
|
||||
graph LR
|
||||
A[Run ETL on all External\nData Sources] --> B[Calculate Score]
|
||||
B --> C[Combine Score with\nGeographic Data]
|
||||
C --> D[Generate Map Tiles]
|
||||
```
|
||||
|
||||
#### 5. Shapefiles
|
||||
You can perform these steps either using Docker or by running the application in your local environment.
|
||||
|
||||
If you want to use the shapefiles in mapping applications, you can access them [here](https://justice40-data.s3.amazonaws.com/data-pipeline/data/score/shapefile/usa.zip).
|
||||
### Running with Docker
|
||||
|
||||
Docker can be used to run the application inside a container without setting up a local environment.
|
||||
|
||||
### Score generation and comparison workflow
|
||||
> :exclamation: **ATTENTION**
|
||||
> You must increase the memory resource of your container to at least 8096 MB to run this application in Docker
|
||||
|
||||
The descriptions below provide a more detailed outline of what happens at each step of ETL and score calculation workflow.
|
||||
Before running with Docker, you must build the Docker container. Make sure you're in the root directory of the repository (`/justice40-tool`) and run `docker-compose build --no-cache`.
|
||||
|
||||
#### Workflow Diagram
|
||||
Once you've built the Docker container, run `docker-compose up`. Docker will spin up 3 containers: the client container, the static server container and the data container. Once all data is generated, you can see the application by navigating to [http://localhost:8000](http://localhost:8000) in your browser.
|
||||
|
||||
TODO add mermaid diagram
|
||||
<details>
|
||||
<summary>View additional commands</summary>
|
||||
|
||||
#### Step 0: Set up your environment
|
||||
|
||||
1. Choose whether you'd like to run this application using Docker or if you'd like to install the dependencies locally so you can contribute to the project.
|
||||
- **With Docker:** Follow these [installation instructions](https://docs.docker.com/get-docker/) and skip down to the [Running with Docker section](#running-with-docker) for more information
|
||||
- **For Local Development:** Skip down to the [Local Development section](#local-development) for more detailed installation instructions
|
||||
|
||||
#### Step 1: Run the script to download census data or download from the Justice40 S3 URL
|
||||
|
||||
1. Call the `census_data_download` command using the application manager `application.py` **NOTE:** This may take several minutes to execute.
|
||||
- With Docker: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application census-data-download`
|
||||
- With Poetry: `poetry run download_census` (Install GDAL as described [below](#local-development))
|
||||
2. If you have a high speed internet connection and don't want to generate the census data or install `GDAL` locally, you can download a zip version of the Census file [here](https://justice40-data.s3.amazonaws.com/data-sources/census.zip). Then unzip and move the contents inside the `data/data-pipeline/data_pipeline/data/census/` folder/
|
||||
|
||||
#### Step 2: Run the ETL script for each data source
|
||||
|
||||
##### Table of commands
|
||||
|
||||
| VS code command | actual command | run time | what it does | where it writes to | notes |
|
||||
|---------------------------|---------------------|----------|----------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
|
||||
| ETL run | etl-run | | Downloads the data set files | data/dataset | check if there are any changes in data_pipeline/etl/sources. if there are none this can be skipped. |
|
||||
| Score run | score-run | 6 mins | consume the etl outputs and combine into a score csv full. | data/score/csv/full/usa.csv | |
|
||||
| Generate Score post | generate-score-post | 9 mins | 1. combines the score/csv/full with counties. 2. downloadable assets (xls, csv, zip), 3. creates the tiles/csv | data/score/csv/tiles/usa.csv, data/ score/downloadable | check destination folder to see if newly created |
|
||||
| Combine score and geoJson | geo-score | 26 mins | 1. combine the data/score/csv/tiles/usa.csv with the census tiger geojson data 2. aggregates into super tracts for usa-low layer | data/score/geojson (usa high / low) | |
|
||||
| Generate Map Tiles | generate-map-tiles | 35 mins | ogr-ogr pbf / mvt tiles generator that consume the geojson usa high / usa low | data/score/tiles/ high or low / {zoomLevel} | |
|
||||
|
||||
##### ETL steps
|
||||
1. Call the `etl-run` command using the application manager `application.py` **NOTE:** This may take several minutes to execute.
|
||||
- With Docker: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application etl-run`
|
||||
- With Poetry: `poetry run python3 data_pipeline/application.py etl-run`
|
||||
2. This command will execute the corresponding ETL script for each data source in `data_pipeline/etl/sources/`. For example, `data_pipeline/etl/sources/ejscreen/etl.py` is the ETL script for EJSCREEN data.
|
||||
3. Each ETL script will extract the data from its original source, then format the data into `.csv` files that get stored in the relevant folder in `data_pipeline/data/dataset/`. For example, HUD Housing data is stored in `data_pipeline/data/dataset/hud_housing/usa.csv`
|
||||
|
||||
_**NOTE:** You have the option to pass the name of a specific data source to the `etl-run` command using the `-d` flag, which will limit the execution of the ETL process to that specific data source._
|
||||
_For example: `poetry run python3 data_pipeline/application.py etl-run -d ejscreen` would only run the ETL process for EJSCREEN data._
|
||||
|
||||
#### Step 3: Calculate the Justice40 score experiments
|
||||
|
||||
1. Call the `score-run` command using the application manager `application.py` **NOTE:** This may take several minutes to execute.
|
||||
- With Docker: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application score-run`
|
||||
- With Poetry: `poetry run python3 data_pipeline/application.py score-run`
|
||||
1. The `score-run` command will execute the `etl/score/etl.py` script which loads the data from each of the source files added to the `data/dataset/` directory by the ETL scripts in Step 1.
|
||||
1. These data sets are merged into a single dataframe using their Census Block Group GEOID as a common key, and the data in each of the columns is standardized in two ways:
|
||||
- Their [percentile rank](https://en.wikipedia.org/wiki/Percentile_rank) is calculated, which tells us what percentage of other Census Block Groups have a lower value for that particular column.
|
||||
- They are normalized using [min-max normalization](https://en.wikipedia.org/wiki/Feature_scaling), which adjusts the scale of the data so that the Census Block Group with the highest value for that column is set to 1, the Census Block Group with the lowest value is set to 0, and all of the other values are adjusted to fit within that range based on how close they were to the highest or lowest value.
|
||||
1. The standardized columns are then used to calculate each of the Justice40 score experiments described in greater detail below, and the results are exported to a `.csv` file in [`data_pipeline/data/score/csv`](data_pipeline/data/score/csv)
|
||||
|
||||
#### Step 4: Compare the Justice40 score experiments to other indices
|
||||
|
||||
We are building a comparison tool to enable easy (or at least straightforward) comparison of the Justice40 score with other existing indices. The goal of having this is so that as we experiment and iterate with a scoring methodology, we can understand how our score overlaps with or differs from other indices that communities, nonprofits, and governmentss use to inform decision making.
|
||||
|
||||
Right now, our comparison tool exists simply as a python notebook in `data/data-pipeline/data_pipeline/ipython/scoring_comparison.ipynb`.
|
||||
|
||||
To run this comparison tool:
|
||||
|
||||
1. Make sure you've gone through the above steps to run the data ETL and score generation.
|
||||
1. From the package directory (`data/data-pipeline/data_pipeline/`), navigate to the `ipython` directory: `cd ipython`.
|
||||
1. Ensure you have `pandoc` installed on your computer. If you're on a Mac, run `brew install pandoc`; for other OSes, see pandoc's [installation guide](https://pandoc.org/installing.html).
|
||||
1. Start the notebooks: `jupyter notebook`
|
||||
1. In your browser, navigate to one of the URLs returned by the above command.
|
||||
1. Select `scoring_comparison.ipynb` from the options in your browser.
|
||||
1. Run through the steps in the notebook. You can step through them one at a time by clicking the "Run" button for each cell, or open the "Cell" menu and click "Run all" to run them all at once.
|
||||
1. Reports and spreadsheets generated by the comparison tool will be available in `data/data-pipeline/data_pipeline/data/comparison_outputs`.
|
||||
|
||||
_NOTE:_ This may take several minutes or over an hour to fully execute and generate the reports.
|
||||
|
||||
### Data Sources
|
||||
|
||||
- **[EJSCREEN](data_pipeline/etl/sources/ejscreen):** TODO Add description of data source
|
||||
- **[Census](data_pipeline/etl/sources/census):** TODO Add description of data source
|
||||
- **[American Communities Survey](data_pipeline/etl/sources/census_acs):** TODO Add description of data source
|
||||
- **[Housing and Transportation](data_pipeline/etl/sources/housing_and_transportation):** TODO Add description of data source
|
||||
- **[HUD Housing](data_pipeline/etl/sources/hud_housing):** TODO Add description of data source
|
||||
- **[HUD Recap](data_pipeline/etl/sources/hud_recap):** TODO Add description of data source
|
||||
- **[CalEnviroScreen](data_pipeline/etl/sources/calenviroscreen):** TODO Add description of data source
|
||||
|
||||
## Running using Docker
|
||||
|
||||
We use Docker to install the necessary libraries in a container that can be run in any operating system.
|
||||
|
||||
_Important_: To be able to run the data Docker containers, you need to increase the memory resource of your container to at leat 8096 MB.
|
||||
|
||||
To build the docker container the first time, make sure you're in the root directory of the repository and run `docker-compose build --no-cache`.
|
||||
|
||||
Once completed, run `docker-compose up`. Docker will spin up 3 containers: the client container, the static server container and the data container. Once all data is generated, you can see the application using a browser and navigating to `http://localhost:8000`.
|
||||
|
||||
If you want to run specific data tasks, you can open a terminal window, navigate to the root folder for this repository and then execute any command for the application using this format:
|
||||
If you want to run specific data tasks, you can open a terminal window, navigate to the root folder for this repository, and execute any command for the application using this format:
|
||||
|
||||
`docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application [command]`
|
||||
|
||||
Here's a list of commands:
|
||||
|
||||
- Get help: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application --help`
|
||||
- Generate census data: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application census-data-download`
|
||||
- Run all ETL and Generate score: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application score-full-run`
|
||||
|
@ -203,196 +91,120 @@ Here's a list of commands:
|
|||
- Combine Score with Geojson and generate high and low zoom map tile sets: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application geo-score`
|
||||
- Generate Map Tiles: `docker run --rm -it -v ${PWD}/data/data-pipeline/data_pipeline/data:/data_pipeline/data j40_data_pipeline python3 -m data_pipeline.application generate-map-tiles`
|
||||
|
||||
## Local development
|
||||
To learn more about these commands and when they should be run, refer to [Running for Local Development](#running-for-local-development).
|
||||
|
||||
You can run the Python code locally without Docker to develop, using Poetry. However, to generate the census data you will need the [GDAL library](https://github.com/OSGeo/gdal) installed locally. For score generation, you will need [libspatialindex](https://libspatialindex.org/en/latest/). And to generate tiles for a local map, you will need [Mapbox tippecanoe](https://github.com/mapbox/tippecanoe). Please refer to the repos for specific instructions for your OS.
|
||||
</details>
|
||||
|
||||
### VSCode
|
||||
---
|
||||
|
||||
If you are using VSCode, you can make use of the `.vscode` folder checked in under `data/data-pipeline/.vscode`. To do this, open this directory with `code data/data-pipeline`.
|
||||
### Running in Your Local Environment
|
||||
|
||||
Here's whats included:
|
||||
When running in your local environment, each step of the application can be run individually or as a group.
|
||||
|
||||
1. `launch.json` - launch commands that allow for debugging the various commands in `application.py`. Note that because we are using the otherwise excellent [Click CLI](https://click.palletsprojects.com/en/8.0.x/), and Click in turn uses `console_scripts` to parse and execute command line options, it is necessary to run the equivalent of `python -m data_pipeline.application [command]` within `launch.json` to be able to set and hit breakpoints (this is what is currently implemented. Otherwise, you may find that the script times out after 5 seconds. More about this [here](https://stackoverflow.com/questions/64556874/how-can-i-debug-python-console-script-command-line-apps-with-the-vscode-debugger).
|
||||
> :bulb: **NOTE**
|
||||
> This section only describes the steps necessary to run the Justice40 Data Pipeline and Scoring Application. If you'd like to run the client application, visit the [client README](/client/README.md). Please note that the client application does not use the data locally generated by the application by default.
|
||||
|
||||
2. `settings.json` - these ensure that you're using the default linter (`pylint`), formatter (`flake8`), and test library (`pytest`) that the team is using.
|
||||
Start by familiarizing yourself with the available commands. To do this, navigate to `justice40-tool/data/data-pipeline` and run `poetry run python3 data_pipeline/application.py --help`. You'll see a list of commands and what those commands do. You can also request help on any individual command to get more information about command options (e.g. `poetry run python3 data_pipeline/application.py etl-run --help`).
|
||||
|
||||
3. `tasks.json` - these enable you to use `Terminal->Run Task` to run our preferred formatters and linters within your project.
|
||||
> :exclamation: **ATTENTION**
|
||||
> Some commands fetch large amounts of data from remote data sources or run resource-intensive calculations. They may take a long time to complete (e.g. generate-map-tiles can take over 30 minutes). Those that fetch data from remote data sources (e.g. `etl-run`) should not be run too often; if they are, you may get throttled or eventually blocked by the sites serving the data.
|
||||
|
||||
Users are instructed to only add settings to this file that should be shared across the team, and not to add settings here that only apply to local development environments (particularly full absolute paths which can differ between setups). If you are looking to add something to this file, check in with the rest of the team to ensure the proposed settings should be shared.
|
||||
#### Download Census Data
|
||||
|
||||
### MacOS
|
||||
Begin the process of running the application in your local environment by downloading census data.
|
||||
|
||||
To install the above-named executables:
|
||||
> :bulb: **NOTE**
|
||||
> You'll only need to do this once (unless you clean your census data folder)! Subsequent runs will use the data you've already downloaded.
|
||||
|
||||
- gdal: `brew install gdal`
|
||||
- Tippecanoe: `brew install tippecanoe`
|
||||
- spatialindex: `brew install spatialindex`
|
||||
To download census data, run the command `poetry run python3 data_pipeline/application.py census-data-download`.
|
||||
|
||||
Note: For MacOS Monterey or M1 Macs, [you might need to follow these steps](https://stackoverflow.com/a/70880741) to install Scipy.
|
||||
If you have a high speed internet connection and don't want to generate the census data or install `GDAL` locally, you can download [a zip version of the Census file](https://justice40-data.s3.amazonaws.com/data-sources/census.zip). Unzip and move the contents inside the `data/data-pipeline/data_pipeline/data/census` folder.
|
||||
|
||||
### Windows Users
|
||||
#### Run the Application
|
||||
|
||||
If you want to run tile generation, please install TippeCanoe [following these instructions](https://github.com/GISupportICRC/ArcGIS2Mapbox#installing-tippecanoe-on-windows). You also need some pre-requisites for Geopandas as specified in the Poetry requirements. Please follow [these instructions](https://stackoverflow.com/questions/56958421/pip-install-geopandas-on-windows) to install the Geopandas dependency locally. It's definitely easier if you have access to WSL (Windows Subsystem Linux), and install these packages using commands similar to our [Dockerfile](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/Dockerfile).
|
||||
Running the application in your local environment allows the most flexibility. You can pick and choose which commands you run, and test parts of the application individually or as a group. While we can't anticipate all of your individual development scenarios, we can give you the steps you'll need to run the application from start to finish.
|
||||
|
||||
### Setting up Poetry
|
||||
Once you've downloaded the census data, run the following commands – in order – to exercise the entire Data Pipeline and Scoring Application. The commands can be run from `justice40-tool/data/data-pipeline` in the form `poetry run python3 data_pipeline/application.py insert-name-of-command-here`.
|
||||
|
||||
- Start a terminal
|
||||
- Change to this directory (`/data/data-pipeline/`)
|
||||
- Make sure you have at least Python 3.8 installed: `python -V` or `python3 -V`
|
||||
- We use [Poetry](https://python-poetry.org/) for managing dependencies and building the application. Please follow the instructions on their site to download.
|
||||
- Install Poetry requirements with `poetry install`
|
||||
| Step | Command | Description | Example Output |
|
||||
| ---- | --------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------- |
|
||||
| 1 | `etl-run` | Performs the ETL steps on all external data sources, and saves the resulting intermediate file | `data/dataset` |
|
||||
| 2 | `score-run` | Generates and stores the score | `data/score/csv/full/usa.csv` |
|
||||
| 3 | `generate-score-post` | Performs a host of post-score activities, including adding county and state data to the score, shortening column names, and generating a downloadable package of data | `data/score/csv/tiles/usa.csv`, `data/score/downloadable` |
|
||||
| 4 | `geo-score` | Merges geoJSON data with score data, creating both high and low resolution results | `data/score/geojson[usa high or low]` |
|
||||
| 5 | `generate-map-tiles` | Generates map tiles for use in client website | `data/score/tiles/ high or low / {zoomLevel}` |
|
||||
|
||||
### Running tox
|
||||
Many commands have options. For example, you can run a single dataset with `etl-run` by passing the command line parameter `-d name-of-dataset-to-run`. Please use the `--help` option to find out more.
|
||||
|
||||
Our full test and check suite is run using tox. This can be run using commands such
|
||||
as `poetry run tox`.
|
||||
## How Scoring Works
|
||||
|
||||
Each run can take a while to build the whole environment. If you'd like to save time,
|
||||
you can use the previously built environment by running `poetry run tox -e lint`
|
||||
which will drastically speed up the linting process.
|
||||
Scores are generated by running the `score-run` command via Poetry or Docker. This command executes [`data_pipeline/etl/score/etl_score.py`](data_pipeline/etl/score/etl_score.py). During execution,
|
||||
|
||||
### Configuring pre-commit hooks
|
||||
1. Source files from the [`data_pipeline/data/dataset`](data_pipeline/data/dataset) directory are loaded into memory (these source files were generated by the `etl-run` command)
|
||||
2. These data sets are merged into a single dataframe using their Census Block Group GEOID as a common key, and the data in each of the columns is standardized in two ways:
|
||||
- Their [percentile rank](https://en.wikipedia.org/wiki/Percentile_rank) is calculated, which tells us what percentage of other Census Block Groups have a lower value for that particular column.
|
||||
- They are normalized using [min-max normalization](https://en.wikipedia.org/wiki/Feature_scaling), which adjusts the scale of the data so that the Census Block Group with the highest value for that column is set to 1, the Census Block Group with the lowest value is set to 0, and all of the other values are adjusted to fit within that range based on how close they were to the highest or lowest value.
|
||||
3. The standardized columns are then used to calculate each of the Justice40 scores, and the results are exported to `data_pipeline/data/score/csv/full/usa.csv`. Different versions of the scoring algorithm – including the current version – can be found in [`data_pipeline/score`](data_pipeline/score).
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
To promote consistent code style and quality, we use git pre-commit hooks to
|
||||
automatically lint and reformat our code before every commit we make to the codebase.
|
||||
Pre-commit hooks are defined in the file [`.pre-commit-config.yaml`](../.pre-commit-config.yaml).
|
||||
<!-- markdown-link-check-enable -->
|
||||
## Comparing Scores
|
||||
|
||||
1. First, install [`pre-commit`](https://pre-commit.com/) globally:
|
||||
Scores can be compared to both internally calculated scores and scores calculated by other existing indices.
|
||||
|
||||
$ brew install pre-commit
|
||||
### Internal Comparison
|
||||
|
||||
2. While in the `data/data-pipeline` directory, run `pre-commit install` to install
|
||||
the specific git hooks used in this repository.
|
||||
Locally calculated scores can be easily compared with the score in production by using the [Score Comparator](data_pipeline/comparator.py). The score comparator compares the number and name of the columns, the number of census tracts (rows), and the score values (if the columns and census tracts line up).
|
||||
|
||||
Now, any time you commit code to the repository, the hooks will run on all modified files automatically. If you wish,
|
||||
you can force a re-run on all files with `pre-commit run --all-files`.
|
||||
The Score Comparator runs on every Github Pull Request, but can be run manually by `poetry run python3 data_pipeline/comparator.py compare-score` from the `justice40-tool/data/data-pipeline` directory.
|
||||
|
||||
#### Conflicts between backend and frontend git hooks
|
||||
<!-- markdown-link-check-disable -->
|
||||
In the front-end part of the codebase (the `justice40-tool/client` folder), we use
|
||||
`Husky` to run pre-commit hooks for the front-end. This is different than the
|
||||
`pre-commit` framework we use for the backend. The frontend `Husky` hooks are
|
||||
configured at
|
||||
[client/.husky](client/.husky).
|
||||
### External Comparison
|
||||
|
||||
It is not possible to run both our `Husky` hooks and `pre-commit` hooks on every
|
||||
commit; either one or the other will run.
|
||||
We are building a comparison tool to enable easy (or at least straightforward) comparison of the Justice40 score with other existing indices. The goal of having this is so that as we experiment and iterate with a scoring methodology, we can understand how our score overlaps with or differs from other indices that communities, nonprofits, and governments use to inform decision making.
|
||||
|
||||
<!-- markdown-link-check-enable -->
|
||||
Right now, our comparison tool exists simply as a python notebook in `data/data-pipeline/data_pipeline/ipython/scoring_comparison.ipynb`.
|
||||
|
||||
`Husky` is installed every time you run `npm install`. To use the `Husky` front-end
|
||||
hooks during front-end development, simply run `npm install`.
|
||||
To run this comparison tool:
|
||||
|
||||
However, running `npm install` overwrites the backend hooks setup by `pre-commit`.
|
||||
To restore the backend hooks after running `npm install`, do the following:
|
||||
1. Make sure you've gone through the above steps to run the data ETL and score generation.
|
||||
1. From the package directory (`data/data-pipeline/data_pipeline/`), navigate to the `ipython` directory.
|
||||
1. Ensure you have `pandoc` installed on your computer. If you're on a Mac, run `brew install pandoc`; for other OSes, see pandoc's [installation guide](https://pandoc.org/installing.html).
|
||||
1. Start the notebooks: `jupyter notebook`
|
||||
1. In your browser, navigate to one of the URLs returned by the above command.
|
||||
1. Select `scoring_comparison.ipynb` from the options in your browser.
|
||||
1. Run through the steps in the notebook. You can step through them one at a time by clicking the "Run" button for each cell, or open the "Cell" menu and click "Run all" to run them all at once.
|
||||
1. Reports and spreadsheets generated by the comparison tool will be available in `data/data-pipeline/data_pipeline/data/comparison_outputs`.
|
||||
|
||||
1. Run `pre-commit install` while in the `data/data-pipeline` directory.
|
||||
2. The terminal should respond with an error message such as:
|
||||
```
|
||||
[ERROR] Cowardly refusing to install hooks with `core.hooksPath` set.
|
||||
hint: `git config --unset-all core.hooksPath`
|
||||
```
|
||||
|
||||
This error is caused by having previously run `npm install` which used `Husky` to
|
||||
overwrite the hooks path.
|
||||
|
||||
3. Follow the hint and run `git config --unset-all core.hooksPath`.
|
||||
4. Run `pre-commit install` again.
|
||||
|
||||
Now `pre-commit` and the backend hooks should take precedence.
|
||||
|
||||
### The Application entrypoint
|
||||
|
||||
After installing the poetry dependencies, you can see a list of commands with the following steps:
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- Then run `poetry run python3 data_pipeline/application.py --help`
|
||||
|
||||
### Downloading Census Block Groups GeoJSON and Generating CBG CSVs (not normally required)
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- If you want to clear out all data and tiles from all directories, you can run: `poetry run python3 data_pipeline/application.py data-cleanup`.
|
||||
- Then run `poetry run python3 data_pipeline/application.py census-data-download`
|
||||
Note: Census files are hosted in the Justice40 S3 and you can skip this step by passing the `-s aws` or `--data-source aws` flag in the scripts below
|
||||
|
||||
### Run all ETL, score and map generation processes
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- Then run `poetry run python3 data_pipeline/application.py data-full-run -s aws`
|
||||
- Note: The `-s` flag is optional if you have generated/downloaded the census data
|
||||
|
||||
### Run both ETL and score generation processes
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- Then run `poetry run python3 data_pipeline/application.py score-full-run`
|
||||
|
||||
### Run all ETL processes
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- Then run `poetry run python3 data_pipeline/application.py etl-run`
|
||||
|
||||
### Generating Map Tiles
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- Then run `poetry run python3 data_pipeline/application.py generate-map-tiles -s aws`
|
||||
- If you have S3 keys, you can sync to the dev repo by doing `aws s3 sync ./data_pipeline/data/score/tiles/ s3://justice40-data/data-pipeline/data/score/tiles --acl public-read --delete`
|
||||
- Note: The `-s` flag is optional if you have generated/downloaded the score data
|
||||
|
||||
### Serve the map locally
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- For USA high zoom: `docker run --rm -it -v ${PWD}/data/score/tiles/high:/data -p 8080:80 maptiler/tileserver-gl`
|
||||
|
||||
### Running Jupyter notebooks
|
||||
|
||||
- Start a terminal
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- Run `poetry run jupyter notebook`. Your browser should open with a Jupyter Notebook tab
|
||||
|
||||
### Activating variable-enabled Markdown for Jupyter notebooks
|
||||
|
||||
- Change to the package directory (i.e., `cd data/data-pipeline/data_pipeline`)
|
||||
- Activate a Poetry Shell (see above)
|
||||
- Run `jupyter contrib nbextension install --user`
|
||||
- Run `jupyter nbextension enable python-markdown/main`
|
||||
- Make sure you've loaded the Jupyter notebook in a "Trusted" state. (See button near top right of Notebook screen.)
|
||||
|
||||
For more information, see [nbextensions docs](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html) and
|
||||
see [python-markdown docs](https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tree/master/src/jupyter_contrib_nbextensions/nbextensions/python-markdown).
|
||||
> :exclamation: **ATTENTION**
|
||||
> This may take over an hour to fully execute and generate the reports.
|
||||
|
||||
## Testing
|
||||
|
||||
### Background
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
|
||||
For this project, we make use of [pytest](https://docs.pytest.org/en/latest/) for testing purposes.
|
||||
|
||||
<!-- markdown-link-check-enable-->
|
||||
|
||||
To run tests, simply run `poetry run pytest` in this directory (i.e., `justice40-tool/data/data-pipeline`).
|
||||
To run tests, simply run `poetry run pytest` in this directory (`justice40-tool/data/data-pipeline`).
|
||||
|
||||
Test data is configured via [fixtures](https://docs.pytest.org/en/latest/explanation/fixtures.html).
|
||||
|
||||
### Score and post-processing tests
|
||||
### Running the Full Suite
|
||||
|
||||
The fixtures used in the score post-processing tests are slightly different. These fixtures utilize [pickle files](https://docs.python.org/3/library/pickle.html) to store dataframes to disk. This is ultimately because if you assert equality on two dataframes, even if column values have the same "visible" value, if their types are mismatching they will be counted as not being equal.
|
||||
Our _full_ test and check suite – including security and code format checks – is configured using [`tox`](tox.ini). This suite can be run using the command `poetry run tox` from the `justice40-tool/data/data-pipeline` directory.
|
||||
|
||||
Each run takes a while to build the environment from scratch. If you'd like to save time, you can use the previously built environment by running `poetry run tox -e lint`.
|
||||
|
||||
### Score and Post-Processing Tests
|
||||
|
||||
The fixtures used in the score post-processing tests are slightly different. These fixtures use [pickle files](https://docs.python.org/3/library/pickle.html) to store dataframes to disk. This is ultimately because if you assert equality on two dataframes, even if column values have the same _visible_ value, if their types are mismatching they will be counted as not being equal.
|
||||
|
||||
In a bit more detail:
|
||||
|
||||
1. Pandas dataframes are typed, and by default, types are inferred when you create one from scratch. If you create a dataframe using the `DataFrame` [constructors](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame), there is no guarantee that types will be correct, without explicit `dtype` annotations. Explicit `dtype` annotations are possible, but, and this leads us to point #2:
|
||||
|
||||
2. Our transformations/dataframes in the source code under test itself doesn't always require specific types, and it is often sufficient in the code itself to just rely on the `object` type. I attempted adding explicit typing based on the "logical" type of given columns, but in practice it resulted in non-matching dataframes that _actually_ had the same value -- in particular it was very common to have one dataframe column of type `string` and another of type `object` that carried the same values. So, that is to say, even if we did create a "correctly" typed dataframe (according to our logical assumptions about what types should be), they were still counted as mismatched against the dataframes that are actually used in our program. To fix this "the right way", it is necessary to explicitly annotate types at the point of the `read_csv` call, which definitely has other potential unintended side effects and would need to be done carefully.
|
||||
|
||||
3. For larger dataframes (some of these have 150+ values), it was initially deemed too difficult/time consuming to manually annotate all types, and further, to modify those type annotations based on what is expected in the souce code under test.
|
||||
2. Our transformations/dataframes in the source code under test itself doesn't always require specific types, and it is often sufficient in the code itself to just rely on the `object` type. I attempted adding explicit typing based on the "logical" type of given columns, but in practice it resulted in non-matching dataframes that _actually_ had the same value – in particular it was very common to have one dataframe column of type `string` and another of type `object` that carried the same values. So, that is to say, even if we did create a "correctly" typed dataframe (according to our logical assumptions about what types should be), they were still counted as mismatched against the dataframes that are actually used in our program. To fix this "the right way", it is necessary to explicitly annotate types at the point of the `read_csv` call, which definitely has other potential unintended side effects and would need to be done carefully.
|
||||
3. For larger dataframes (some of these have 150+ values), it was initially deemed too difficult/time consuming to manually annotate all types, and further, to modify those type annotations based on what is expected in the soucre code under test.
|
||||
|
||||
#### Updating Pickles
|
||||
|
||||
|
@ -414,19 +226,23 @@ score_initial_df = pd.read_csv(score_csv_path, dtype={"GEOID10_TRACT": "string"}
|
|||
score_initial_df.to_csv(data_path / "data_pipeline" / "etl" / "score" / "tests" / "sample_data" /"score_data_initial.csv", index=False)
|
||||
```
|
||||
|
||||
Now you can move on to updating individual pickles for the tests. Note that it is helpful to do them in this order:
|
||||
Now you can move on to updating individual pickles for the tests.
|
||||
|
||||
> :bulb: **NOTE**
|
||||
> It is helpful to perform the steps in VS Code, and in this order.
|
||||
|
||||
We have four pickle files that correspond to expected files:
|
||||
|
||||
- `score_data_expected.pkl`: Initial score without counties
|
||||
- `score_transformed_expected.pkl`: Intermediate score with `etl._extract_score` and `etl. _transform_score` applied. There's no file for this intermediate process, so we need to capture the pickle mid-process.
|
||||
- `tile_data_expected.pkl`: Score with columns to be baked in tiles
|
||||
- `downloadable_data_expected.pk1`: Downloadable csv
|
||||
| Pickle | Purpose |
|
||||
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `score_data_expected.pkl` | Initial score without counties |
|
||||
| `score_transformed_expected.pkl` | Intermediate score with `etl._extract_score` and `etl. _transform_score` applied. There's no file for this intermediate process, so we need to capture the pickle mid-process. |
|
||||
| `tile_data_expected.pkl` | Score with columns to be baked in tiles |
|
||||
| `downloadable_data_expected.pk1` | Downloadable csv |
|
||||
|
||||
To update the pickles, let's go one by one:
|
||||
To update the pickles, go one by one:
|
||||
|
||||
For the `score_transformed_expected.pkl`, put a breakpoint on [this line]
|
||||
(https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L62), before the `pdt.assert_frame_equal` and run:
|
||||
For the `score_transformed_expected.pkl`, put a breakpoint on [this line](https://github.com/usds/justice40-tool/blob/main/data/data-pipeline/data_pipeline/etl/score/tests/test_score_post.py#L62), before the `pdt.assert_frame_equal` and run:
|
||||
`pytest data_pipeline/etl/score/tests/test_score_post.py::test_transform_score`
|
||||
|
||||
Once on the breakpoint, capture the df to a pickle as follows:
|
||||
|
@ -495,48 +311,31 @@ In the future, we could adopt any of the below strategies to work around this:
|
|||
1. We could use [pytest-snapshot](https://pypi.org/project/pytest-snapshot/) to automatically store the output of each test as data changes. This would make it so that you could avoid having to generate a pickle for each method - instead, you would only need to call `generate` once , and only when the dataframe had changed.
|
||||
|
||||
<!-- markdown-link-check-disable -->
|
||||
|
||||
Additionally, you could use a pandas type schema annotation such as [pandera](https://pandera.readthedocs.io/en/stable/schema_models.html?highlight=inputschema#basic-usage) to annotate input/output schemas for given functions, and your unit tests could use these to validate explicitly. This could be of very high value for annotating expectations.
|
||||
|
||||
<!-- markdown-link-check-enable-->
|
||||
|
||||
Alternatively, or in conjunction, you could move toward using a more strictly-typed container format for read/writes such as SQL/SQLite, and use something like [SQLModel](https://github.com/tiangolo/sqlmodel) to handle more explicit type guarantees.
|
||||
|
||||
### Fixtures used in ETL "snapshot tests"
|
||||
### Fixtures used in ETL "Snapshot Tests"
|
||||
|
||||
ETLs are tested for the results of their extract, transform, and load steps by
|
||||
borrowing the concept of "snapshot testing" from the world of front-end development.
|
||||
ETLs are tested for the results of their extract, transform, and load steps by borrowing the concept of "snapshot testing" from the world of front-end development.
|
||||
|
||||
Snapshots are easy to update and demonstrate the results of a series of changes to
|
||||
the code base. They are good for making sure no results have changed if you don't
|
||||
expect them to change, and they are good when you expect the results to significantly
|
||||
change in a way that would be tedious to update in traditional unit tests.
|
||||
Snapshots are easy to update and demonstrate the results of a series of changes to the code base. They are good for making sure no results have changed if you don't expect them to change, and they are good when you expect the results to significantly change in a way that would be tedious to update in traditional unit tests.
|
||||
|
||||
However, snapshot tests are also dangerous. An unthinking developer may update the
|
||||
snapshot fixtures and unknowingly encode a bug into the supposed intended output of
|
||||
the test.
|
||||
However, snapshot tests are also dangerous. An unthinking developer may update the snapshot fixtures and unknowingly encode a bug into the supposed intended output of the test.
|
||||
|
||||
In order to update the snapshot fixtures of an ETL class, follow the following steps:
|
||||
|
||||
1. If you need to manually update the fixtures, update the "furthest upstream" source
|
||||
that is called by `_setup_etl_instance_and_run_extract`. For instance, this may
|
||||
involve creating a new zip file that imitates the source data. (e.g., for the
|
||||
National Risk Index test, update
|
||||
`data_pipeline/tests/sources/national_risk_index/data/NRI_Table_CensusTracts.zip`
|
||||
which is a 64kb imitation of the 405MB source NRI data.)
|
||||
2. Run `pytest . -rsx --update_snapshots` to update snapshots for all files, or you
|
||||
can pass a specific file name to pytest to be more precise (e.g., `pytest
|
||||
data_pipeline/tests/sources/national_risk_index/test_etl.py -rsx --update_snapshots`)
|
||||
3. Re-run pytest without the `update_snapshots` flag (e.g., `pytest . -rsx`) to
|
||||
ensure the tests now pass.
|
||||
4. Carefully check the `git diff` for the updates to all test fixtures to make sure
|
||||
these are as expected. This part is very important. For instance, if you changed a
|
||||
column name, you would only expect the column name to change in the output. If
|
||||
you modified the calculation of some data, spot check the results to see if the
|
||||
numbers in the updated fixtures are as expected.
|
||||
1. If you need to manually update the fixtures, update the "furthest upstream" source that is called by `_setup_etl_instance_and_run_extract`. For instance, this may involve creating a new zip file that imitates the source data. (e.g., for the National Risk Index test, update `data_pipeline/tests/sources/national_risk_index/data/NRI_Table_CensusTracts.zip` which is a 64kb imitation of the 405MB source NRI data.)
|
||||
2. Run `pytest . -rsx --update_snapshots` to update snapshots for all files, or you can pass a specific file name to pytest to be more precise (e.g., `pytest data_pipeline/tests/sources/national_risk_index/test_etl.py -rsx --update_snapshots`)
|
||||
3. Re-run pytest without the `update_snapshots` flag (e.g., `pytest . -rsx`) to ensure the tests now pass.
|
||||
4. Carefully check the `git diff` for the updates to all test fixtures to make sure these are as expected. This part is very important. For instance, if you changed a column name, you would only expect the column name to change in the output. If you modified the calculation of some data, spot check the results to see if the numbers in the updated fixtures are as expected.
|
||||
|
||||
### Other ETL Unit Tests
|
||||
|
||||
Outside of the ETL snapshot tests discussed above, ETL unit tests are typically
|
||||
organized into three buckets:
|
||||
Outside of the ETL snapshot tests discussed above, ETL unit tests are typically organized into three buckets:
|
||||
|
||||
- Extract Tests
|
||||
- Transform Tests, and
|
||||
|
|
|
@ -25,7 +25,7 @@ def cli():
|
|||
|
||||
|
||||
@cli.command(
|
||||
help="Compare score stored in the AWS production environment to the production score. Defaults to checking against version 1.0.",
|
||||
help="Compare score stored in the AWS production environment to the locally generated score. Defaults to checking against version 1.0.",
|
||||
)
|
||||
@click.option(
|
||||
"-v",
|
||||
|
|
Loading…
Add table
Reference in a new issue