Running Cluster Upgrades
Use frameworks cluster upgrade for service binaries and images, and
frameworks cluster migrate for PostgreSQL/YugabyteDB schema migrations. The
upgrade command downloads or pulls the target release artifact, lets the role
restart the service when the artifact or config changes, validates health, and
rolls back on health-check failure unless --no-rollback is set.
Recommended flow
Section titled “Recommended flow”-
Check the current state:
Terminal window frameworks cluster status -
Preview database migrations:
Terminal window frameworks cluster upgrade plan --version stableframeworks cluster migrate --phase expand --dry-run -
Preview the service or cluster upgrade:
Terminal window frameworks cluster upgrade quartermaster --version stable --dry-runframeworks cluster upgrade --all --dry-run -
Apply PostgreSQL/YugabyteDB migrations before upgrading services that depend on new schema. This normal path assumes the pending migrations are compatible with both the currently running service version and the target version:
Terminal window frameworks cluster migrate --phase expand -
Upgrade one service, or all enabled services in dependency order:
Terminal window frameworks cluster upgrade quartermaster --version stable --yesframeworks cluster upgrade --all --yes -
Run any catalog-declared data migration after compatible binaries are deployed. For example, a billing model migration may need to populate new normalized tables from old JSONB columns before reads can be flipped:
Terminal window frameworks cluster data-migrate list --to-version v0.0.0frameworks cluster data-migrate run <service>.<id> --dry-runframeworks cluster data-migrate run <service>.<id> -
Verify health and migration-specific checks after the rollout:
Terminal window frameworks cluster statusframeworks cluster doctor
Version selection
Section titled “Version selection”--version accepts:
| Value | Meaning |
|---|---|
stable | Latest stable release manifest |
rc | Latest release-candidate manifest |
v0.0.0-rc1 | Specific release manifest |
| omitted | The cluster’s configured channel, or stable |
Set the default channel with:
frameworks cluster set-channel stableframeworks cluster set-channel rcSchema and data migration phases
Section titled “Schema and data migration phases”PostgreSQL and YugabyteDB migrations are managed by frameworks cluster migrate. Run them before deploying service versions that require new tables,
columns, or indexes.
Normal rolling upgrades must follow an expand/data/postdeploy/contract model:
| Phase | What changes | Rollback expectation |
|---|---|---|
| Expand | Add nullable/defaulted columns, tables, indexes, or broader constraints | Old and new binaries both run against the expanded schema |
| Deploy | Roll out binaries that can read/write both shapes | Roll back by redeploying the old binary |
| Data | Populate new rows/columns from old data in batches | Background data migration is idempotent and safe to pause/resume |
| Postdeploy | Prefer or require the new shape once verified | Roll back only while fallback/dual-write remains |
| Contract | Drop old columns/tables/values and fallback code | Not a normal rollback point; requires a new forward fix or restore |
Do not put destructive contract work in the same normal upgrade step as the expand work. Column drops, table drops, enum/check narrowing, required fields without a complete background data migration, and semantic rewrites that old binaries cannot handle require a later contract release or an explicit downtime runbook.
For billing and other audit-sensitive domains:
- finalized invoices should remain immutable unless the release explicitly documents a correction,
- pricing and rating changes should be effective-dated,
- draft/open records may be recalculated only by a documented background data migration or rating command,
- release verification should include counts and totals, not just service health.
ClickHouse migrations are not applied by cluster migrate. For ClickHouse
schema changes, follow the release notes and the ClickHouse Migrations
runbook before upgrading analytics services.
Rollback behavior
Section titled “Rollback behavior”If health validation fails, cluster upgrade attempts to roll the service back
to the previously detected version. Keep rollback enabled for normal production
rollouts.
This rollback only swaps the service artifact back. It does not undo schema or data migrations. That is intentional for expand-compatible changes: the old binary should keep working while extra tables/columns remain in place. If a release includes a migration that makes the old binary invalid, treat it as a downtime upgrade with a tested database restore or forward-fix plan.
Use --no-rollback only when you want to inspect a failed upgraded service in
place:
frameworks cluster upgrade bridge --version rc --no-rollbackCLI support status
Section titled “CLI support status”Today, frameworks cluster migrate --phase expand applies pending embedded
PostgreSQL/YugabyteDB migrations from the database/version/phase directory
layout. The command records phase and checksum in _migrations, rejects edited
applied migrations, and does not run service-specific data migrations.
frameworks cluster doctor also checks the migration ledger: embedded
PostgreSQL/YugabyteDB migrations are compared with _migrations in each
configured database so pending required migrations and checksum drift show up
during normal diagnostics.
frameworks cluster upgrade plan shows the target rollout order and embedded
SQL migrations by phase. The CLI embeds a release catalog for service database
ownership, data-migration requirements, compatibility floors, and hard
intermediate release requirements. cluster upgrade uses that catalog together
with live _migrations and service-owned _data_migrations state to refuse
unsafe DB-backed rollouts before changing binaries.
frameworks cluster data-migrate is the first-class surface for resumable
service-owned data migrations. list shows catalog-declared work and adoption
state; run --dry-run is read-only; run, status, verify, pause, and
resume operate through the service binary on the target host. If a
release declares a required data migration but the service binary cannot report
state, upgrade and postdeploy/contract gates fail closed instead of treating
the unknown as safe.
Contract migrations are intentionally separate. cluster migrate --phase contract runs only after the matching data-migration gate succeeds, so
destructive cleanup cannot be mixed into the routine expand/deploy step.
Local compose exception
Section titled “Local compose exception”The root docker-compose.yml is a seeded development stack, not an
operator-managed cluster. For local development, incompatible schema changes can
be handled by recreating volumes and reseeding:
docker compose down -vdocker compose up -dThat shortcut is acceptable for local compose because demo data is disposable. It is not an operator upgrade path.
When to use provision
Section titled “When to use provision”Use frameworks cluster provision for:
- first deployment of a cluster,
- adding a new service or host to the manifest,
- repairing drift where you intentionally want a full converge,
- re-rendering infrastructure after changing role inputs.
For routine release rollouts, use cluster upgrade plus cluster migrate.
Why some services keep their older version label
Section titled “Why some services keep their older version label”A FrameWorks release manifest pins every service by content identity (Docker
image digest, native tarball SHA-256), not by version label. When a service’s
source code did not change since an earlier release, the new manifest carries
that service’s previous service_version label forward verbatim — for
example, a v0.2.40 release can legitimately list helmsman with
service_version: v0.2.37 because that is the release that actually produced
the bytes you are running.
This is the artefact provenance model: platform_version says “what
release this manifest belongs to,” and service_version per component says
“which release actually produced this component’s bytes.”
Practical consequences:
cluster upgrade --planshows each service’s target image digest and native tarball checksum, not just version labels. If your installed identity equals the target identity, nothing is re-pulled regardless of the umbrella version.- For edge nodes, Foghorn’s release reconciler compares per-component versions
against
foghorn.node_components. Carried-forward components keep their old label, so a release that doesn’t touch helmsman/mist/caddy does not roll any edge node. - A “carried” service is not a downgrade. It is the same artefact that shipped earlier, content-pinned forward into the new release.
Release notes call out carried components when relevant.