Skip to content

Running Cluster Upgrades

Use frameworks cluster upgrade for service binaries and images, and frameworks cluster migrate for PostgreSQL/YugabyteDB schema migrations. The upgrade command downloads or pulls the target release artifact, lets the role restart the service when the artifact or config changes, validates health, and rolls back on health-check failure unless --no-rollback is set.

  1. Check the current state:

    Terminal window
    frameworks cluster status
  2. Preview database migrations:

    Terminal window
    frameworks cluster upgrade plan --version stable
    frameworks cluster migrate --phase expand --dry-run
  3. Preview the service or cluster upgrade:

    Terminal window
    frameworks cluster upgrade quartermaster --version stable --dry-run
    frameworks cluster upgrade --all --dry-run
  4. Apply PostgreSQL/YugabyteDB migrations before upgrading services that depend on new schema. This normal path assumes the pending migrations are compatible with both the currently running service version and the target version:

    Terminal window
    frameworks cluster migrate --phase expand
  5. Upgrade one service, or all enabled services in dependency order:

    Terminal window
    frameworks cluster upgrade quartermaster --version stable --yes
    frameworks cluster upgrade --all --yes
  6. Run any catalog-declared data migration after compatible binaries are deployed. For example, a billing model migration may need to populate new normalized tables from old JSONB columns before reads can be flipped:

    Terminal window
    frameworks cluster data-migrate list --to-version v0.0.0
    frameworks cluster data-migrate run <service>.<id> --dry-run
    frameworks cluster data-migrate run <service>.<id>
  7. Verify health and migration-specific checks after the rollout:

    Terminal window
    frameworks cluster status
    frameworks cluster doctor

--version accepts:

ValueMeaning
stableLatest stable release manifest
rcLatest release-candidate manifest
v0.0.0-rc1Specific release manifest
omittedThe cluster’s configured channel, or stable

Set the default channel with:

Terminal window
frameworks cluster set-channel stable
frameworks cluster set-channel rc

PostgreSQL and YugabyteDB migrations are managed by frameworks cluster migrate. Run them before deploying service versions that require new tables, columns, or indexes.

Normal rolling upgrades must follow an expand/data/postdeploy/contract model:

PhaseWhat changesRollback expectation
ExpandAdd nullable/defaulted columns, tables, indexes, or broader constraintsOld and new binaries both run against the expanded schema
DeployRoll out binaries that can read/write both shapesRoll back by redeploying the old binary
DataPopulate new rows/columns from old data in batchesBackground data migration is idempotent and safe to pause/resume
PostdeployPrefer or require the new shape once verifiedRoll back only while fallback/dual-write remains
ContractDrop old columns/tables/values and fallback codeNot a normal rollback point; requires a new forward fix or restore

Do not put destructive contract work in the same normal upgrade step as the expand work. Column drops, table drops, enum/check narrowing, required fields without a complete background data migration, and semantic rewrites that old binaries cannot handle require a later contract release or an explicit downtime runbook.

For billing and other audit-sensitive domains:

  • finalized invoices should remain immutable unless the release explicitly documents a correction,
  • pricing and rating changes should be effective-dated,
  • draft/open records may be recalculated only by a documented background data migration or rating command,
  • release verification should include counts and totals, not just service health.

ClickHouse migrations are not applied by cluster migrate. For ClickHouse schema changes, follow the release notes and the ClickHouse Migrations runbook before upgrading analytics services.

If health validation fails, cluster upgrade attempts to roll the service back to the previously detected version. Keep rollback enabled for normal production rollouts.

This rollback only swaps the service artifact back. It does not undo schema or data migrations. That is intentional for expand-compatible changes: the old binary should keep working while extra tables/columns remain in place. If a release includes a migration that makes the old binary invalid, treat it as a downtime upgrade with a tested database restore or forward-fix plan.

Use --no-rollback only when you want to inspect a failed upgraded service in place:

Terminal window
frameworks cluster upgrade bridge --version rc --no-rollback

Today, frameworks cluster migrate --phase expand applies pending embedded PostgreSQL/YugabyteDB migrations from the database/version/phase directory layout. The command records phase and checksum in _migrations, rejects edited applied migrations, and does not run service-specific data migrations.

frameworks cluster doctor also checks the migration ledger: embedded PostgreSQL/YugabyteDB migrations are compared with _migrations in each configured database so pending required migrations and checksum drift show up during normal diagnostics.

frameworks cluster upgrade plan shows the target rollout order and embedded SQL migrations by phase. The CLI embeds a release catalog for service database ownership, data-migration requirements, compatibility floors, and hard intermediate release requirements. cluster upgrade uses that catalog together with live _migrations and service-owned _data_migrations state to refuse unsafe DB-backed rollouts before changing binaries.

frameworks cluster data-migrate is the first-class surface for resumable service-owned data migrations. list shows catalog-declared work and adoption state; run --dry-run is read-only; run, status, verify, pause, and resume operate through the service binary on the target host. If a release declares a required data migration but the service binary cannot report state, upgrade and postdeploy/contract gates fail closed instead of treating the unknown as safe.

Contract migrations are intentionally separate. cluster migrate --phase contract runs only after the matching data-migration gate succeeds, so destructive cleanup cannot be mixed into the routine expand/deploy step.

The root docker-compose.yml is a seeded development stack, not an operator-managed cluster. For local development, incompatible schema changes can be handled by recreating volumes and reseeding:

Terminal window
docker compose down -v
docker compose up -d

That shortcut is acceptable for local compose because demo data is disposable. It is not an operator upgrade path.

Use frameworks cluster provision for:

  • first deployment of a cluster,
  • adding a new service or host to the manifest,
  • repairing drift where you intentionally want a full converge,
  • re-rendering infrastructure after changing role inputs.

For routine release rollouts, use cluster upgrade plus cluster migrate.

Why some services keep their older version label

Section titled “Why some services keep their older version label”

A FrameWorks release manifest pins every service by content identity (Docker image digest, native tarball SHA-256), not by version label. When a service’s source code did not change since an earlier release, the new manifest carries that service’s previous service_version label forward verbatim — for example, a v0.2.40 release can legitimately list helmsman with service_version: v0.2.37 because that is the release that actually produced the bytes you are running.

This is the artefact provenance model: platform_version says “what release this manifest belongs to,” and service_version per component says “which release actually produced this component’s bytes.”

Practical consequences:

  • cluster upgrade --plan shows each service’s target image digest and native tarball checksum, not just version labels. If your installed identity equals the target identity, nothing is re-pulled regardless of the umbrella version.
  • For edge nodes, Foghorn’s release reconciler compares per-component versions against foghorn.node_components. Carried-forward components keep their old label, so a release that doesn’t touch helmsman/mist/caddy does not roll any edge node.
  • A “carried” service is not a downgrade. It is the same artefact that shipped earlier, content-pinned forward into the new release.

Release notes call out carried components when relevant.