Skip to content

Operations

This page covers production deployment, backup and disaster recovery, and change management for Mantle.

Production Deployment

Using the Helm Chart

The recommended way to run Mantle in production is with the included Helm chart. It configures health probes, resource limits, and environment variables:

helm install mantle charts/mantle \
  --set database.url="postgres://mantle:secret@db.internal:5432/mantle?sslmode=require" \
  --set encryption.key="your-hex-key"

The Helm chart configures:

  • Liveness probe pointing to /healthz
  • Readiness probe pointing to /readyz
  • SIGTERM as the termination signal (aligns with Mantle’s graceful shutdown)

Health Probes

Configure your load balancer or orchestrator to use the health endpoints:

ProbeEndpointRecommended Interval
LivenessGET /healthz10s
ReadinessGET /readyz5s

The readiness probe returns a non-200 status when the database connection is lost, which causes the load balancer to stop routing traffic to the unhealthy instance.

Environment Variables

In production, pass configuration through environment variables rather than config files:

export MANTLE_DATABASE_URL="postgres://mantle:secret@db.internal:5432/mantle?sslmode=require"
export MANTLE_API_ADDRESS=":8080"
export MANTLE_ENCRYPTION_KEY="your-64-char-hex-key"
export MANTLE_LOG_LEVEL="warn"
mantle serve

See Configuration for the full list of environment variables.

Migrations

The server runs migrations automatically on startup. You do not need a separate mantle init step in your deployment pipeline. This is safe to run with multiple replicas — migrations use database-level locking to prevent conflicts.

Backup and Disaster Recovery

Postgres is Mantle’s single point of state. All workflow definitions, execution history, step checkpoints, encrypted credentials, and audit events live in the database. Recovery from any failure depends on having a good database backup and access to your encryption key.

What to Back Up

AssetLocationNotes
Postgres databaseYour database hostContains all Mantle state: definitions, executions, credentials, audit events
Encryption keyMANTLE_ENCRYPTION_KEY env var or mantle.yamlRequired to decrypt credentials; store separately from database backups
mantle.yamlConfiguration file on diskCan be reconstructed, but easier to back up
Workflow YAML filesVersion control (Git)Authoritative source; can be re-applied with mantle apply

Critical: Store the encryption key separately from database backups. If an attacker obtains both the database dump and the encryption key, they can decrypt all stored credentials.

Managed Postgres (RDS, Cloud SQL, Azure Database):

  • Enable automated daily snapshots with your cloud provider
  • Enable WAL archiving (point-in-time recovery) for near-zero RPO
  • Retain snapshots according to your compliance requirements

Self-hosted Postgres:

  • Schedule pg_dump on a cron job (e.g., daily or hourly depending on RPO requirements):
pg_dump -Fc -h localhost -U mantle mantle > /backups/mantle-$(date +%Y%m%d-%H%M%S).dump
  • Configure WAL archiving for continuous point-in-time recovery
  • Ship backups to off-site storage (S3, GCS, or another region)

Recovery Procedure

  1. Restore Postgres from backup. Use your cloud provider’s restore flow or pg_restore:
pg_restore -h localhost -U mantle -d mantle /backups/mantle-20260322-120000.dump
  1. Verify migration state. Run mantle init to confirm all migrations are applied:
mantle init
  1. Verify credentials. Confirm the encryption key matches the one used when the backup was taken:
mantle secrets list
  1. Resume the server:
mantle serve
  1. Re-apply workflow YAML files if needed. Workflow definitions are stored in Postgres, so a database restore brings them back. However, if you need to apply changes that were made after the backup was taken, re-apply from version control:
mantle apply workflows/*.yaml

RPO and RTO Guidance

Backup StrategyRecovery Point Objective (RPO)Recovery Time Objective (RTO)
Daily pg_dumpUp to 24 hours of data lossMinutes (restore + restart)
Hourly pg_dumpUp to 1 hour of data lossMinutes
WAL archiving (PITR)Near-zero (seconds of data loss)Minutes to restore to a point in time
Managed snapshots + WALNear-zeroDepends on cloud provider restore time

For production deployments, WAL archiving combined with periodic base backups gives the best balance of RPO and operational simplicity.

Change Management

All changes to Mantle’s codebase and deployment follow a controlled process.

  1. Pull request review. All changes go through PR review on GitHub. At least one approval is required before merging to main.
  2. CI must pass. Every PR runs the full CI pipeline: go test, go vet, golangci-lint, govulncheck, and gosec. PRs with failing checks are not merged.
  3. Production deployments. Deploy via Helm with migration hooks. The Helm chart runs mantle init as a pre-install/pre-upgrade hook to apply database migrations before the new binary starts serving traffic.
  4. Rollback. If a deployment introduces a problem, roll back with helm rollback and verify the migration state:
helm rollback mantle <revision>
mantle migrate status

Mantle migrations are forward-only by design. Rolling back the binary to an older version is safe as long as the database schema is compatible. Check mantle migrate status to confirm.