Update workflow documentation

This commit is contained in:
ericiwamoto 2024-12-18 10:18:59 -08:00 committed by Carlos Felix
parent 5531374ca6
commit fd587c5b99
3 changed files with 134 additions and 0 deletions

View file

@ -0,0 +1,56 @@
# J40 Workflow Environment Variables and Secrets
## Summary
The Github Action workflows used to build and deploy the Justice40 data pipeline and website depend on some environment variables. Non-sensitive values are stored in the Github repo as [environment variables](https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/store-information-in-variables). Sensitive values that should not be exposed publicly are stored in the repo as [secrets](https://docs.github.com/en/actions/security-for-github-actions/security-guides/using-secrets-in-github-actions).
## List of Environment Variables
### DESTINATION_FOLDER
This is a local environment variable in the Deploy Frontend Main workflow derived from branch name used to name the deploy directory
### SCORE_VERSION
The version of the scoring to be deployed. The current version is "2.0".
## List of Secrets
### CENSUS_API_KEY
The key used to access US Census datasets via its [APIs](https://www.census.gov/data/developers/data-sets.html). A new key can be requested for free [here](https://api.census.gov/data/key_signup.html).
### CLIENT_DEV_AWS_ACCESS_KEY_ID
The AWS access key id used to add/remove files to the S3_WEB_BUCKET, as well as invalidating the Cloudfront distribution belonging to WEB_CDN_ID. This access key requires read/write access to the S3 bucket, and full access to the Cloudfront distribution.
### CLIENT_DEV_AWS_SECRET_ACCESS_KEY
The AWS secret access key belonging to CLIENT_DEV_AWS_ACCESS_KEY_ID.
### DATA_CDN_ID
The ID of the AWS Cloudfront distribution for the S3_DATA_BUCKET.
### DATA_DEV_AWS_ACCESS_KEY_ID
The AWS access key id used to add/remove files to the S3_DATA_BUCKET, as well as invalidating the Cloudfront distribution belonging to DATA_CDN_ID. This access key requires read/write access to the S3 bucket, and full access to the Cloudfront distribution.
### DATA_DEV_AWS_SECRET_ACCESS_KEY
The AWS secret access key belonging to DATA_DEV_AWS_ACCESS_KEY_ID.
### DATA_SOURCE
Local variable that determines if the website should point to a local directory or use the production AWS cdn for backend data. Value can be set to `cdn` or `local`.
### DATA_URL
The full address of the backend data files hostname, currently [https://static-data-screeningtool.geoplatform.gov](https://static-data-screeningtool.geoplatform.gov). This information is public so technically it could be changed to be a non-secret environment variable.
### J40_TOOL_MONITORING_SLACK_ALERTS
The [Slack webhook](https://api.slack.com/messaging/webhooks) address used by the Ping Check workflow to send failure alerts.
### SITE_URL
The full address of the Justice40 Website hostname, currently [https://screeningtool.geoplatform.gov](https://screeningtool.geoplatform.gov). This information is public so technically it could be changed to be a non-secret environment variable.
### S3_DATA_BUCKET
The name of the AWS S3 bucket hosting the files created by the data pipeline application.
### S3_WEBSITE_BUCKET
The name of the AWS S3 bucket hosting the static website files.
### WEB_CDN_ID
The ID of the AWS Cloudfront distribution for the S3_WEBSITE_BUCKET.
## Future Improvements
To improve security, a few items should be addressed. The use of AWS access keys should be replaced by a more secure soultion such as [OpenID Connect (OIDC)](https://aws.amazon.com/blogs/security/use-iam-roles-to-connect-github-actions-to-actions-in-aws/). If continuing to use AWS acccess keys, then key rotation should be implemented using a process such as the one documented [here](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/automatically-rotate-iam-user-access-keys-at-scale-with-aws-organizations-and-aws-secrets-manager.html). The CENSUS_API_KEY could be rotated, but it would have to be a manual process as there is no programmatic way to generate a new key.

50
.github/workflows/INFRASTRUCTURE.md vendored Normal file
View file

@ -0,0 +1,50 @@
# Justice40 Website Infrastructure
## Summary
The infrastructure setup to deploy the Justice40 website consists of two AWS S3 buckets, one for the backend data and one for the frontend web client. Each S3 bucket has an AWS Cloudfront distribution set up to serve the bucket contents. Each Cloudfront distribution has a .gov dns hostname pointed to itself.
### Data Bucket
The Data Bucket contains two main directories: `data-sources` and `data-versions`. The `data-sources` folder contains cached copies of the data sets used by the etl pipeline. Unprocessed data sets are in individual directories under a sub-directory called `raw-data-sources`. The `data-verions` folder contains the uploaded output of the etl pipeline. This includes scoring files and map tile files. The files are uploaded to a subfolder named for the version of the scoring used for file creation. The current version is `2.0`.
```
.
├── data-sources
│ ├── raw-data-sources
│ │ └── ...
│ └── ...
└── data-versions
├── 1.0
│ └── ...s
├── 2.0
│ └── data
│ ├── csv
│ │ └── ...
│ ├── downloadable
│ │ └── ...
│ ├── geojson
│ │ └── ...
│ ├── search
│ │ └── ...
│ ├── shapefile
│ │ └── ...
│ └── tiles
│ └── ...
└── ...
```
### Data Cloudfront Distribution
The Data Cloudfront Distribution is the CDN for Data Bucket. The data bucket should be used as the origin for the cloudfront distribution.
### Website Bucket
The Website Bucket contains the static website files. Instead of deploying to the top level of the bucket, files are deployed under a folder called justice40-tool. In order to support multiple environmetns, files are deployed to a folder named for the environment or branch in the Github Repo that deployed the files. Currently the production website is deployed to a folder called `main`. A staging environment could be deployed in parallel to a folder named `staging`.
```
.
└── justice40-tool
├── main
│ └── ...
└── ...
```
### Website Cloudfront Distrubtion
The Website Cloudfront Distribution is the CDN for the Website Bucket. The website bucket should be used as the origin for the cloudfront distribution. Furthermore, the origin path for the distribution should be set to file path of the static website files that are uploaded, else the website will not display properly. Currently the productin origin path is `/justice40-tool/main`.

View file

@ -7,3 +7,31 @@ This project is deployed on an AWS account managed by [GeoPlatform.gov](https://
The names of the Github Actions stages in the yaml files should describe what each step does, so it is best to refer there to understand the latest of what everything is doing.
To mitigate the risk of having this README quickly become outdated as the project evolves, avoid documenting anything application or data pipeline specific here. Instead, go back up to the [top level project README](/README.md) and refer to the documentation on those components directly.
## List of Current Workflows
### Check Markdown Links
Runs Linkspector with Reviewdog on pull requests to identify and report on dead hyperlinks within the code.
### CodeQL
Runs Github's CodeQL engine against the code to check for security vulnerabilities.
### Compile Mermaid to MD
Compiles mermaid markdown into images. This action should be deprecated as the action is no longer supported
### Deploy Backend Main
Builds and deploys the backend data pipeline to AWS. This workflow is set to be triggered manually.
### Deploy Frontend Main
Builds and deploys the frontend web client to AWS when changes to the ./client directory are merged into main.
### pages-build-deployment
### Ping Check
Runs a check on the J40 website checking for a return of status 200
### Pull Request Backend
Builds the backend data pipeline when a pull request is opened with changes within the ./data directory
### Pull Request Frontend
Builds the frontend web client when a pull request is opened with changes within the ./client directory