Multi-Cluster Operations

What Is a Cluster?

A cluster is a Foghorn instance (or HA pair) plus its edge nodes — strictly the media plane. All clusters share the same central control plane (Commodore, Quartermaster, Purser) and data plane (Decklog, Periscope, Signalman). These services are not duplicated or federated per cluster.

While control and data plane services share infrastructure, they are registered under separate cluster IDs for operational visibility. The control plane cluster appears in the inventory and dashboard but does not participate in Foghorn federation.

Cross-cluster coordination is handled exclusively by the FoghornFederation gRPC protocol between Foghorn instances.

Deployment Models

Model	Description
Shared	Free shared capacity is currently unmetered; premium shared tiers are seeded for later enforcement. All tenants share the same cluster, edges, and services.
Dedicated	Enterprise tier. Isolated cluster with dedicated Foghorn, edges, and capacity — operated by FrameWorks or self-hosted on-premise via the CLI, with Manual Deployment as an advanced reference.
Hybrid	Tenant runs self-hosted edges that fall back to the primary shared cluster. Edges federate via Foghorn; control/data plane remains shared.
Open Marketplace	Operators publish listed clusters with access controls, invites, subscription requests, and pricing metadata. A vetted public marketplace program for third-party capacity is on the roadmap.

Cluster Lifecycle

Creating a Cluster

Via CLI provisioning (recommended for full deployments):

# Generate and validate GitOps-owned WireGuard identity from a local checkout.
frameworks mesh wg generate --manifest gitops/clusters/production/cluster.yaml
frameworks mesh wg check --gitops-dir gitops --cluster production

# Provisioning is read-only against GitOps.
frameworks cluster provision --gitops-dir gitops --cluster production

When the manifest has a clusters section, the CLI creates all clusters, registers nodes to the correct cluster based on role-based resolution, and generates per-cluster enrollment tokens. The manifest must already contain complete WireGuard identity for every managed Privateer host; cluster provision validates that state but does not write cluster.yaml or hosts.enc.yaml. See Cluster Manifest: clusters for the manifest format.

Via admin command (ad-hoc cluster creation):

frameworks admin clusters create \
  --cluster-id eu-prod \
  --cluster-name "EU Production" \
  --cluster-type edge \
  --base-url eu.frameworks.network \
  --foghorn-count 2 \
  --deployment-model shared

This:

Creates the cluster record in Quartermaster
Claims N Foghorn instances from the pool (--foghorn-count)

Navigator-managed DNS and certificates are driven separately from node/service inventory and certificate status, not directly by admin clusters create.

Service Pool

Cluster-scoped media services such as Foghorn, Chandler, and Livepeer gateway are assigned from shared pools to logical media clusters.

# View pool status (shows per-cluster grouping)
frameworks admin service-pool status --service-type foghorn

# Assign idle instances to a cluster
frameworks admin service-pool assign --service-type foghorn --cluster-id <CLUSTER_ID> --count 2

# Release an assigned instance back to the pool
frameworks admin service-pool add --service-type livepeer-gateway --instance-id <UUID>

# Drain an instance from its cluster (graceful)
frameworks admin service-pool drain --service-type chandler --instance-id <UUID>

Cluster Health

frameworks admin clusters cert-status --cluster-id <ID>

Shows cluster health including Foghorn HA status, edge count, and wildcard certificate readiness.

Federation

Foghorn instances discover peers via Quartermaster’s ListPeers RPC. The peer manager maintains long-lived PeerChannel connections for known peers, refreshes them periodically, and also accepts demand-driven peer hints from stream validation so new routing relationships do not wait for the next refresh.

How It Works

PeerChannel: Bidirectional gRPC streams between Foghorn peers exchange edge telemetry, stream advertisements, cluster summaries, heartbeats, and replication/artifact events on their own intervals
QueryStream: When a viewer’s cluster doesn’t have the stream, Foghorn asks the origin cluster for edge candidates
Origin-Pull: If a local edge has capacity, Foghorn arranges a DTSC pull from the remote origin. Subsequent viewers are served locally
Redirect: If no local capacity, the viewer is redirected (307) to the remote cluster’s edge

Leader-Only Peering

Each cluster elects one Foghorn instance (via Redis SET NX, 15s TTL) to run PeerChannel connections. This prevents duplicate peer traffic in HA deployments.

For architecture details, see docs/architecture/federation.md and docs/architecture/stream-replication-topology.md.

Artifact Command Routing

When a tenant deletes a clip, stops a DVR, or removes a VOD asset, Commodore routes the command to the cluster that owns the artifact — not necessarily the tenant’s primary cluster. This uses a push+forward model:

Push: Commodore tracks which cluster ingested each stream (active_ingest_cluster_id). Artifact operations read origin_cluster_id from the business registry and route directly to that cluster’s Foghorn.
Forward (safety net): If the command arrives at a Foghorn that doesn’t own the artifact (stale routing data, race condition), Foghorn forwards it to federation peers via ForwardArtifactCommand. The first peer that owns the artifact handles it.

Tenant suspension (TerminateTenantStreams, InvalidateTenantCache) fans out to all clusters the tenant has access to, ensuring streams are terminated and caches invalidated everywhere.

Marketplace

The marketplace allows third-party operators to publish clusters that other tenants can subscribe to.

Cluster Visibility

Clusters can be configured with different visibility levels:

Visibility	Who Can See	Access
`private`	Only the owner tenant	Direct access or invite token for another tenant
`unlisted`	Direct-link/invited users	Via cluster invite token
`public`	All tenants	Via subscription request

Subscription Lifecycle

Tenant discovers cluster (marketplace UI or invite link)
  → RequestClusterSubscription (GraphQL mutation)
  → Cluster owner approves/rejects (ApproveClusterSubscription / RejectClusterSubscription)
  → On approval: Quartermaster activates the tenant_cluster_access record
  → Paid cluster checkout/subscription records are created by Purser's cluster-subscription flow
  → Tenant can now route streams through the cluster

Invitation Flow

Cluster operators can invite specific tenants:

# Creates an invite token with optional expiry
frameworks admin clusters invites create \
  --cluster-id <ID> \
  --owner-tenant-id <OWNER_TENANT_UUID> \
  --invited-tenant-id <INVITED_TENANT_UUID> \
  --expires-in-days 7

The invited tenant sees the cluster in their dashboard and can accept with one click.

Dashboard Pages

The webapp provides dedicated infrastructure pages for multi-cluster operations:

Page	Route	Purpose
Infrastructure Overview	`/infrastructure`	Tenant info, platform performance (CPU/memory/nodes), service health summary, clickable cluster cards
Cluster Detail	`/infrastructure/[clusterId]`	Per-cluster metrics, node cards with live health, service instances, health checks
Node Detail	`/nodes/[id]`	Per-node CPU/memory/disk/streams, 5-minute performance history, service instances
Clusters	`/infrastructure/clusters`	Merged view — “My Clusters” tab (subscriptions, invitations, approvals, private cluster creation) and “Marketplace” tab (browse and connect to clusters)
Federation	`/infrastructure/federation`	Federation overview — topology map with peering/traffic relationships, traffic matrix, event type breakdown, recent federation events
Audience Analytics	`/analytics/audience`	Routing map with cross-cluster flow visualization (amber lines for cross-cluster routes), routing events tagged with local/cross-cluster badges

Edge Enrollment

Edge nodes enroll into a specific cluster using bootstrap tokens.

Token Creation

Via Dashboard (tenant owner):

Go to Infrastructure -> Clusters
Click Create Private Cluster or select an existing cluster
Copy the bootstrap token

Via CLI (admin):

frameworks admin bootstrap-tokens create \
  --kind edge_node \
  --tenant-id <TENANT_UUID> \
  --cluster-id <CLUSTER_ID> \
  --name "edge-node-1" \
  --ttl 24h

Provisioning with Token

frameworks edge provision \
  --ssh [email protected] \
  --enrollment-token enroll_xxx \
  --email [email protected]

The CLI sends the token to Bridge via the public bootstrapEdge mutation. Bridge validates the token via Quartermaster, finds the cluster’s assigned Foghorn, and proxies PreRegisterEdge. The response carries:

Assigned node ID and edge domain ({node_label}.{cluster_slug}.{base}, where node_label is the node ID with a single edge- prefix)
Pool domain for the cluster
Foghorn gRPC address the edge will use at runtime for Helmsman’s control stream
Internal CA bundle for initial gRPC trust bootstrap

The operator never needs to know cluster topology — the bootstrap token IS the cluster selector. After bootstrap, the edge talks directly to its assigned Foghorn over the public internet; it never joins the WireGuard mesh.

If you need to dial a specific Foghorn directly (for debugging), pass --foghorn-addr and the CLI bypasses Bridge for that one call.

DNS

Each cluster gets its own set of DNS records under {cluster_slug}.{base_domain}:

Record	Purpose
`edge-ingest.{slug}.{base}`	RTMP/E-RTMP/SRT/WHIP ingest
`edge.{slug}.{base}`	Edge pool (any edge in cluster)
`foghorn.{slug}.{base}`	Viewer routing / playback resolution
`livepeer.{slug}.{base}`	Livepeer gateway / transcoding endpoint
`{node_label}.{slug}.{base}`	Per-edge A records for direct addressing (`node_label` is the node ID with a single `edge-` prefix)

Navigator manages these records automatically. Wildcard TLS certificates (*.{slug}.{base}) are issued via ACME DNS-01 and distributed to edges via ConfigSeed.

See DNS and Cluster Routing for details.

Billing Attribution

Stream/viewer/artifact analytics carry cluster_id (serving cluster) and origin_cluster_id (where the stream was ingested). The billing rollups use those fields when they are present and attribute non-cluster-scoped usage, such as storage, processing, and API usage, to the tenant’s primary cluster. This enables per-cluster billing:

Periscope Query generates per-cluster UsageSummary records
Purser stores these as per-cluster usage records
Settlement queries can identify cross-cluster traffic for infrastructure cost attribution

Inter-cluster DTSC bandwidth is infrastructure cost, not a tenant-facing billing item.

Cluster Pricing

Each cluster has a pricing model configured in purser.cluster_pricing:

Pricing Model	Description
`free_unmetered`	No metering (community tier)
`metered`	Pay-as-you-go resource billing
`monthly`	Fixed monthly subscription
`tier_inherit`	Inherit pricing from the tenant’s billing tier
`custom`	Operator-defined pricing

See docs/architecture/cross-cluster-billing.md and docs/architecture/billing-tier-provisioning.md for the full attribution and provisioning model.

Monitoring

Federation Health

Monitor peer connectivity and replication state:

PeerChannel status: Each Foghorn logs peer connections and heartbeat latency
Stream advertisements: Exchanged every 5 seconds via PeerChannel; stale advertisements indicate peer issues
Active replications: In-flight cross-cluster DTSC pulls tracked in Redis (5-min TTL)
Federation events: All cross-cluster operations (peering, replication, artifact access, redirects) emit geo-enriched events to ClickHouse (federation_events table). View them on the Federation dashboard page (/infrastructure/federation)
PeerHeartbeat geo exchange: Foghorn peers exchange their geographic coordinates (resolved from infrastructure_nodes.external_ip at bootstrap). This enables topology map visualization with real geographic positions

Per-Cluster Metrics

ClickHouse tables partition by cluster_id, enabling per-cluster dashboards for:

Viewer hours and egress
Stream health and QoE
Edge node utilization
Cross-cluster replication events
Federation event geo data (local/remote lat/lon for topology visualization)