j40-cejst-2/infrastructure/functions.yml
Lucas Scharenbroich 38fff9cea8
Fargate Serverless Workers for Census Data Enrichment and Tile Generation (#230)
* add basic infrastructure

* add cloudfront distribution

* WIP checkpoint

* add ecs cluster

* add conditions and route53 dns entry to cloudfront

* WIP checkin

* Added a raw execution mode for demo/testing

* Add pre-defined Task for ogr2ogr

* Tweak Task Definition name

* Mostly working except for logging error

* Add additional logging permissions

* Succesfully executed ogr2ogr in fargate.  S3 permissions needs to be addresses

* Add multipart permissions

* Add a few more actions

* Put IAM Policy on the correct resource

* Deploy lambda and update events

* fix iam permissions 🤦🏻‍♂️

* Add reference to Tippecanoe container

* Clean up to only use named actions

* Refactor resources to include support for tippecanoe

* Make a more interesting GDAL command

* Pull all ECS variables into environment file; successful test of running tippecanoe container

* Support pre/post commands

* Refactor codebase and enable linting

* Implement many-to-many enrichment between USDS CSV files and Census zipped shapefiles

* Change the GDAL image to one with the built-in drivers

* Add some additional fixes to support the enrichment use case

* Clean up old hello-world example

* Expand the README to include ways to execute the lambdas

* Validate scheduled lambda execution and then comment out

Co-authored-by: Tim Zwolak <timothypage@gmail.com>
2021-06-30 09:29:01 -04:00

71 lines
3.5 KiB
YAML

DetectChangesForWorker:
handler: functions/detect-changes-for-worker/index.handler
name: ${self:provider.stage}-DetectChangesForWorker
description: Scans an S3 bucket (with prefix) for items that have changes recently and sends them to ECS Tasks for processing
runtime: nodejs12.x
memorySize: 512
timeout: 900
environment:
REGION: ${self:provider.region}
STAGE: ${self:provider.stage}
ECS_CLUSTER: !Ref ECSCluster
VPC_SUBNET_ID:
Fn::ImportValue: ${self:provider.stage}-PrivateSubnetOne
GDAL_TASK_DEFINITION: ${self:custom.environment.GDAL_TASK_DEFINITION_NAME}
GDAL_CONTAINER_DEFINITION: ${self:custom.environment.GDAL_CONTAINER_DEFINITION_NAME}
TIPPECANOE_TASK_DEFINITION: ${self:custom.environment.TIPPECANOE_TASK_DEFINITION_NAME}
TIPPECANOE_CONTAINER_DEFINITION: ${self:custom.environment.TIPPECANOE_CONTAINER_DEFINITION_NAME}
# The ECS Tasks can be kicked of my invoking the lambda on a schedule. This can provide the
# ability to do nightly refreshed of the data.
# events:
# - schedule:
# rate: cron(*/2 * * * ? *) # Fire every 2 minutes
# input:
# action: "gdal"
# command:
# - "ogrinfo"
# - "-al"
# - "-so"
# - "-ro"
# - "/vsizip//vsicurl/https://j40-sit-justice40-data-harvester-data.s3.amazonaws.com/census/tabblock2010_01_pophu.zip"
# - schedule:
# rate: cron(0 5 * * ? *) # Scan for updated data at Midnight Eastern Time
# input:
# action: enrichment
# sourceBucketName: !Ref DataBucket
# sourceBucketPrefix: usds/custom.csv
# age: 86400 # Seconds
# censusBucketName: j40-sit-justice40-data-harvester-data
# censusBucketPrefix: census/tabblock2010_01_pophu.zip
# pre:
# - Fn::Join: ['', ["wget https://j40-sit-justice40-data-harvester-data.s3.amazonaws.com/usds/$", "{source.Key} -O /tmp/custom.csv"]]
# command:
# - "-f"
# - "GeoJSON"
# - "-sql"
# - Fn::Join: ['', ["SELECT * FROM $", "{census.Key:base} LEFT JOIN '/tmp/custom.csv'.custom ON $", "{census.Key:base}.BLOCKID10 = custom.BLOCKID10"]]
# - Fn::Join: ['', ["/vsis3/j40-sit-justice40-data-harvester-data/joined/$", "{source.Key:base}-$", "{census.Key:base}.json"]]
# - Fn::Join: ['', ["/vsizip//vsicurl/https://j40-sit-justice40-data-harvester-data.s3.amazonaws.com/census/$", "{census.Key}"]]
# - schedule:
# rate: cron(0 7 * * ? *) # Run two hours after the generating any GeoJSON
# input:
# action: tippecanoe
# pre:
# - "curl https://gp-sit-tileservice-tile-cache.s3.amazonaws.com/usds/usa.csv -o /tmp/usa.csv"
# - "curl https://gp-sit-tileservice-tile-cache.s3.amazonaws.com/usds/tristate.mbtiles -o /tmp/tristate.mbtiles"
# post:
# - "aws s3 cp /tmp/tl_2010_bg_with_data.mbtiles s3://j40-sit-justice40-data-harvester-data/output/tl_2010_bg_with_data.mbtiles"
# - "tile-join --force -pk -pC -n tl_2010_bg -e /tmp/tiles /tmp/tl_2010_bg_with_data.mbtiles"
# - "aws s3 sync /tmp/tiles s3://j40-sit-justice40-data-harvester-data/output/tiles"
# command:
# - "tile-join"
# - "--force"
# - "-pk"
# - "-n"
# - "tl_2010_bg"
# - "-o"
# - "/tmp/tl_2010_bg_with_data.mbtiles"
# - "-c"
# - "/tmp/usa.csv"
# - "/tmp/tristate.mbtiles"