Update workflow documentation

2025-07-31 12:21:16 -07:00 · 2024-12-18 10:18:59 -08:00 · 2024-12-18 10:18:59 -08:00 · fd587c5b99
commit fd587c5b99
parent 5531374ca6
3 changed files with 134 additions and 0 deletions
--- a/.github/workflows/ENVIRONMENT_VARIABLES.md
+++ b/.github/workflows/ENVIRONMENT_VARIABLES.md
@ -0,0 +1,56 @@
+# J40 Workflow Environment Variables and Secrets
+
+## Summary
+The Github Action workflows used to build and deploy the Justice40 data pipeline and website depend on some environment variables. Non-sensitive values are stored in the Github repo as [environment variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables). Sensitive values that should not be exposed publicly are stored in the repo as [secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions).
+
+## List of Environment Variables
+
+### DESTINATION_FOLDER
+This is a local environment variable in the Deploy Frontend Main workflow derived from branch name used to name the deploy directory
+
+### SCORE_VERSION
+The version of the scoring to be deployed. The current version is "2.0".
+
+## List of Secrets
+
+### CENSUS_API_KEY
+The key used to access US Census datasets via its [APIs](https://www.census.gov/data/developers/data-sets.html). A new key can be requested for free [here](https://api.census.gov/data/key_signup.html).
+
+### CLIENT_DEV_AWS_ACCESS_KEY_ID
+The AWS access key id used to add/remove files to the S3_WEB_BUCKET, as well as invalidating the Cloudfront distribution belonging to WEB_CDN_ID. This access key requires read/write access to the S3 bucket, and full access to the Cloudfront distribution.
+
+### CLIENT_DEV_AWS_SECRET_ACCESS_KEY
+The AWS secret access key belonging to CLIENT_DEV_AWS_ACCESS_KEY_ID.
+
+### DATA_CDN_ID
+The ID of the AWS Cloudfront distribution for the S3_DATA_BUCKET.
+
+### DATA_DEV_AWS_ACCESS_KEY_ID
+The AWS access key id used to add/remove files to the S3_DATA_BUCKET, as well as invalidating the Cloudfront distribution belonging to DATA_CDN_ID. This access key requires read/write access to the S3 bucket, and full access to the Cloudfront distribution.
+
+### DATA_DEV_AWS_SECRET_ACCESS_KEY
+The AWS secret access key belonging to DATA_DEV_AWS_ACCESS_KEY_ID.
+
+### DATA_SOURCE
+Local variable that determines if the website should point to a local directory or use the production AWS cdn for backend data. Value can be set to `cdn` or `local`.
+
+### DATA_URL
+The full address of the backend data files hostname, currently [https://static-data-screeningtool.geoplatform.gov](https://static-data-screeningtool.geoplatform.gov). This information is public so technically it could be changed to be a non-secret environment variable.
+
+### J40_TOOL_MONITORING_SLACK_ALERTS
+The [Slack webhook](https://api.slack.com/messaging/webhooks) address used by the Ping Check workflow to send failure alerts.
+
+### SITE_URL
+The full address of the Justice40 Website hostname, currently [https://screeningtool.geoplatform.gov](https://screeningtool.geoplatform.gov). This information is public so technically it could be changed to be a non-secret environment variable.
+
+### S3_DATA_BUCKET
+The name of the AWS S3 bucket hosting the files created by the data pipeline application.
+
+### S3_WEBSITE_BUCKET
+The name of the AWS S3 bucket hosting the static website files.
+
+### WEB_CDN_ID
+The ID of the AWS Cloudfront distribution for the S3_WEBSITE_BUCKET.
+
+## Future Improvements
+To improve security, a few items should be addressed. The use of AWS access keys should be replaced by a more secure soultion such as [OpenID Connect (OIDC)](https://aws.amazon.com/blogs/security/use-iam-roles-to-connect-github-actions-to-actions-in-aws/). If continuing to use AWS acccess keys, then key rotation should be implemented using a process such as the one documented [here](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/automatically-rotate-iam-user-access-keys-at-scale-with-aws-organizations-and-aws-secrets-manager.html). The CENSUS_API_KEY could be rotated, but it would have to be a manual process as there is no programmatic way to generate a new key.
--- a/.github/workflows/INFRASTRUCTURE.md
+++ b/.github/workflows/INFRASTRUCTURE.md
@ -0,0 +1,50 @@
+# Justice40 Website Infrastructure
+
+## Summary
+The infrastructure setup to deploy the Justice40 website consists of two AWS S3 buckets, one for the backend data and one for the frontend web client. Each S3 bucket has an AWS Cloudfront distribution set up to serve the bucket contents. Each Cloudfront distribution has a .gov dns hostname pointed to itself.
+
+### Data Bucket
+The Data Bucket contains two main directories: `data-sources` and `data-versions`. The `data-sources` folder contains cached copies of the data sets used by the etl pipeline. Unprocessed data sets are in individual directories under a sub-directory called `raw-data-sources`. The `data-verions` folder contains the uploaded output of the etl pipeline. This includes scoring files and map tile files. The files are uploaded to a subfolder named for the version of the scoring used for file creation. The current version is `2.0`.
+
+```
+  .
+  ├── data-sources
+  │   ├── raw-data-sources
+  │   │   └── ...
+  │   └── ...
+  └── data-versions
+      ├── 1.0
+      │   └── ...s
+      ├── 2.0
+      │   └── data
+      │       ├── csv
+      │       │   └── ...
+      │       ├── downloadable
+      │       │   └── ...
+      │       ├── geojson
+      │       │   └── ...
+      │       ├── search
+      │       │   └── ...
+      │       ├── shapefile
+      │       │   └── ...
+      │       └── tiles
+      │           └── ...
+      └── ...
+```
+
+### Data Cloudfront Distribution
+The Data Cloudfront Distribution is the CDN for Data Bucket. The data bucket should be used as the origin for the cloudfront distribution.
+
+### Website Bucket
+The Website Bucket contains the static website files. Instead of deploying to the top level of the bucket, files are deployed under a folder called justice40-tool. In order to support multiple environmetns, files are deployed to a folder named for the environment or branch in the Github Repo that deployed the files. Currently the production website is deployed to a folder called `main`. A staging environment could be deployed in parallel to a folder named `staging`.
+
+```
+  .
+  └── justice40-tool
+      ├── main
+      │   └── ...
+      └── ...
+```
+
+### Website Cloudfront Distrubtion
+The Website Cloudfront Distribution is the CDN for the Website Bucket. The website bucket should be used as the origin for the cloudfront distribution. Furthermore, the origin path for the distribution should be set to file path of the static website files that are uploaded, else the website will not display properly. Currently the productin origin path is `/justice40-tool/main`.
--- a/.github/workflows/README.md
+++ b/.github/workflows/README.md
@ -7,3 +7,31 @@ This project is deployed on an AWS account managed by [GeoPlatform.gov](https://
 The names of the Github Actions stages in the yaml files should describe what each step does, so it is best to refer there to understand the latest of what everything is doing.

 To mitigate the risk of having this README quickly become outdated as the project evolves, avoid documenting anything application or data pipeline specific here. Instead, go back up to the [top level project README](/README.md) and refer to the documentation on those components directly.
+
+## List of Current Workflows
+
+### Check Markdown Links
+Runs Linkspector with Reviewdog on pull requests to identify and report on dead hyperlinks within the code.
+
+### CodeQL
+Runs Github's CodeQL engine against the code to check for security vulnerabilities.
+
+### Compile Mermaid to MD
+Compiles mermaid markdown into images. This action should be deprecated as the action is no longer supported
+
+### Deploy Backend Main
+Builds and deploys the backend data pipeline to AWS. This workflow is set to be triggered manually.
+
+### Deploy Frontend Main
+Builds and deploys the frontend web client to AWS when changes to the ./client directory are merged into main.
+
+### pages-build-deployment
+
+### Ping Check
+Runs a check on the J40 website checking for a return of status 200
+
+### Pull Request Backend
+Builds the backend data pipeline when a pull request is opened with changes within the ./data directory 
+
+### Pull Request Frontend
+Builds the frontend web client when a pull request is opened with changes within the ./client directory