Batch Rendering Farm

A product catalog image renderer that generates SVG variants from JSON specifications using AWS Batch array jobs. Step Functions orchestrate the workflow: Lambda validates input, Batch processes one child job per product variant with EFS shared storage, and SNS/SQS deliver completion notifications. The scenario exercises Simfra's batch compute, shared filesystem, and workflow orchestration capabilities.

Services

Service Role
AWS Batch Managed compute environment, job queue, and job definition for array jobs
ECS Task execution backing Batch jobs with Go worker containers
ECR Container image repository for the rendering worker
EFS Shared scratch filesystem mounted by all Batch tasks
Lambda Python 3.12 functions for input validation and result publishing
Step Functions Orchestrates validate, render (Batch), publish, and notify steps
S3 Input specs, rendered output, and pipeline artifacts - all SSE-KMS
SNS Completion and failure notification topics
SQS Notification queue with DLQ for failed messages
EventBridge Captures Batch job state-change events
CloudWatch Logs Batch job and Lambda execution logs
KMS Six customer-managed keys for per-service encryption
CodeCommit Source repository
CodeBuild Packages Lambda functions and builds worker image
CodeDeploy Lambda deployment with AllAtOnce traffic shifting
CodePipeline Orchestrates the CI/CD flow

Architecture

S3 (product JSON specs)
  |
  v
Step Functions Workflow
  |
  ├── Lambda: validate-input (check spec exists and is well-formed)
  |
  ├── Batch: array job (one child per variant)
  |     └── Go worker reads spec from S3, generates SVG, writes to S3
  |     └── All children share EFS scratch mount
  |
  ├── Lambda: publish-results (creates manifest.json)
  |
  └── SNS: completion notification --> SQS queue
                                         |
                                   On failure: DLQ

Each Batch array job spawns one child task per product variant. All tasks mount the same EFS filesystem for intermediate scratch data. The Step Functions workflow handles both success and failure paths - on Batch failure, the workflow publishes to a separate failure SNS topic. EventBridge independently captures Batch job state transitions for observability.

What This Validates

  • AWS Batch compute environment, job queue, and job definition lifecycle
  • Batch array jobs with per-child task distribution and parallel execution
  • EFS filesystem mounted in ECS-backed Batch tasks for shared scratch storage
  • Step Functions orchestrating Lambda, Batch, and SNS service integrations
  • ECR container image storage for Batch worker images
  • SNS/SQS notification delivery with KMS encryption
  • Dead-letter queue for failed notification processing
  • EventBridge capturing Batch job state-change events
  • S3 as input/output store with KMS encryption

Test Coverage

Tests cover CI/CD pipeline execution, smoke checks for compute environment and job queue state, integration tests for Step Functions workflows (submit job, verify Batch execution, check S3 output and manifest, validate SNS/SQS notification), failure path testing with DLQ routing, and performance tests with 5 concurrent Step Functions executions and 10 concurrent standalone Batch jobs.