ClickHouse Schema Migrations

Overview

ClickHouse schema changes are managed through embedded, versioned migrations that follow the same model as PostgreSQL. The runner is wired into the same CLI commands.

Scenario	What to use
Fresh install	`frameworks cluster init clickhouse` creates databases, applies the baseline schema, and runs migrations
Upgrading	`frameworks cluster migrate` applies pending migrations from `pkg/database/sql/clickhouse/migrations/`

cluster init clickhouse and cluster migrate are idempotent. Each migration records itself in a per-database _migrations ledger; re-running is a no-op.

Migration structure

Migrations are organized by target database, release version, and phase:

pkg/database/sql/clickhouse/migrations/
└── periscope/
    └── v0.2.31/
        ├── expand/
        │   ├── 001_add_filename_to_artifacts.sql
        │   └── 002_add_viewer_node_tracking.sql
        ├── postdeploy/
        │   └── 001_rebuild_artifact_projection.sql
        └── contract/
            └── 001_drop_legacy_artifact_view.sql

The first path segment after clickhouse/migrations/ must match a configured ClickHouse database name (today: periscope). The CLI embeds these files, applies only migrations whose database exists in the manifest, and records applied files in that database’s _migrations table.

Reconciliation model

The baseline schema in pkg/database/sql/clickhouse/<db>.sql is the cumulative current state at the latest release tag. Migrations advance the schema forward in subsequent releases.

On a fresh install, frameworks cluster init clickhouse:

Creates the configured databases.
Applies the baseline schema (DDL is IF NOT EXISTS, safe to re-run).
Runs every embedded migration up to the cluster’s resolved channel version. The migration SQL is idempotent (also IF NOT EXISTS), so DDL re-application no-ops on objects already created by the baseline; the _migrations ledger row is recorded by the same step.

On upgrades, frameworks cluster migrate runs only the migrations whose ledger row is missing. There is no blind ledger pre-seed: every recorded migration was produced by an actual SQL apply.

Upgrade procedure

Check release notes for required ClickHouse DDL.
Confirm the phase. Routine rolling upgrades apply expand migrations before deploying new binaries. postdeploy migrations cover heavy MODIFY COLUMN rewrites or ALTER TABLE … UPDATE/DELETE mutations and run after binaries are live. contract migrations are explicit cleanup with --yes required.
Apply migrations through the CLI:

frameworks cluster migrate validate
frameworks cluster migrate --phase expand

validate is fast and safe to run anywhere — it checks the embedded migration tree against the layout and per-phase safety rules.

Dry-run first to see what will change:

frameworks cluster migrate --phase expand --dry-run

Dry-run runs the role under ansible-playbook --check --diff. The role explicitly skips every mutating step (ledger table create, SQL apply, ledger insert) and only reports pending migrations.

Apply for real:

frameworks cluster migrate --phase expand --to-version v0.5.0

--to-version is optional; when omitted the cluster’s resolved channel version is used.

Deploy services after expand migrations are in place:

frameworks cluster upgrade --all

Run any required postdeploy documented in the release notes:

frameworks cluster migrate --phase postdeploy --to-version v0.5.0

Ledger details

The role creates <db>._migrations per ClickHouse database with this shape:

CREATE TABLE IF NOT EXISTS _migrations (
  version       LowCardinality(String),
  phase         LowCardinality(String),
  seq           UInt32,
  filename      String,
  checksum      FixedString(64),
  applied_at    DateTime64(3) DEFAULT now64()
) ENGINE = ReplacingMergeTree(applied_at)
ORDER BY (version, phase, seq);

ReplacingMergeTree collapses retried-insert duplicates on merge. Reads use argMax(checksum, applied_at) and surface uniqExact(checksum) so a crashed mid-write that left two distinct checksums under the same key is flagged as a hard failure, not silently resolved.

If a migration file’s content changes after it has been applied, the role fails on checksum mismatch. Treat applied migrations as immutable.

Writing migrations

Guidelines for contributors:

IF NOT EXISTS is mandatory on every CREATE TABLE / VIEW / DICTIONARY. This is what makes idempotent re-apply safe across both fresh installs (where the baseline already created the object) and existing clusters. The validator rejects migrations missing this guard.
Prefer expand-only changes in routine upgrades — add columns/tables, avoid drops, avoid type rewrites.
Use Nullable for new columns — old binaries ignore them, rollback is safe.
Update the baseline schema in the same release: edit pkg/database/sql/clickhouse/<db>.sql so a fresh install ends in the same state as an upgraded one. The migration runs both on fresh installs (no-op against the freshly applied baseline) and on upgrades (the actual schema change).
One migration per logical change — easier to track and debug.
Pick the correct phase:
- expand: additive, safe alongside old + new binaries.
- postdeploy: heavy MODIFY COLUMN rewrites, ALTER TABLE … UPDATE/DELETE mutations, materialized-view rebuilds.
- contract: drops, renames, post-cleanup.
Large historical rewrites (long ALTER … UPDATE, MV rebuilds, replay from Kafka) need release runbooks with progress and verification, not incidental DDL inside a migration.

Example expand migration:

ALTER TABLE periscope.artifact_events
  ADD COLUMN IF NOT EXISTS filename Nullable(String) AFTER internal_name;

ALTER TABLE periscope.artifact_state_current
  ADD COLUMN IF NOT EXISTS filename Nullable(String) AFTER internal_name;

Single-node vs replicated

The default schema is single-node. If you run a replicated topology, every migration that creates or alters a Replicated* table must use ON CLUSTER '{cluster}':

ALTER TABLE periscope.artifact_events ON CLUSTER '{cluster}'
  ADD COLUMN IF NOT EXISTS filename Nullable(String) AFTER internal_name;

The role is single-node aware today; replicated support is a future role var. Until then, replicated topologies need to apply migrations on each shard themselves.

Rollback

Nullable columns are safe to leave — older service versions simply won’t populate them.
Don’t drop columns during rollback unless explicitly required.
Re-deploying an older service version is usually sufficient.
Do not rely on DDL rollback for normal recovery — keep expand changes compatible, redeploy the previous binary, and leave extra columns in place until the contract release.

Troubleshooting

“Unknown column” errors:

Code: 16. DB::Exception: Unknown column 'filename' in column list

Cause: Service deployed before migration was applied. Fix: Apply the migration, then restart the service.

Check ledger contents:

SELECT version, phase, seq, filename, checksum, applied_at
FROM periscope._migrations
ORDER BY version, phase, seq
FORMAT Vertical;

Check if a column exists:

SELECT name, type FROM system.columns
WHERE database = 'periscope' AND table = 'artifact_events' AND name = 'filename';