Skip to content

Multi-Cluster Operations

A cluster is a Foghorn instance (or HA pair) plus its edge nodes — strictly the media plane. All clusters share the same central control plane (Commodore, Quartermaster, Purser) and data plane (Decklog, Periscope, Signalman). These services are not duplicated or federated per cluster.

While control and data plane services share infrastructure, they are registered under separate cluster IDs for operational visibility. The control plane cluster appears in the inventory and dashboard but does not participate in Foghorn federation.

Cross-cluster coordination is handled exclusively by the FoghornFederation gRPC protocol between Foghorn instances.


ModelDescription
SharedFree shared capacity is currently unmetered; premium shared tiers are seeded for later enforcement. All tenants share the same cluster, edges, and services.
DedicatedEnterprise tier. Isolated cluster with dedicated Foghorn, edges, and capacity — operated by FrameWorks or self-hosted on-premise via the CLI, with Manual Deployment as an advanced reference.
HybridTenant runs self-hosted edges that fall back to the primary shared cluster. Edges federate via Foghorn; control/data plane remains shared.
Open MarketplaceOperators publish listed clusters with access controls, invites, subscription requests, and pricing metadata. A vetted public marketplace program for third-party capacity is on the roadmap.

Via CLI provisioning (recommended for full deployments):

Terminal window
# Generate and validate GitOps-owned WireGuard identity from a local checkout.
frameworks mesh wg generate --manifest gitops/clusters/production/cluster.yaml
frameworks mesh wg check --gitops-dir gitops --cluster production
# Provisioning is read-only against GitOps.
frameworks cluster provision --gitops-dir gitops --cluster production

When the manifest has a clusters section, the CLI creates all clusters, registers nodes to the correct cluster based on role-based resolution, and generates per-cluster enrollment tokens. The manifest must already contain complete WireGuard identity for every managed Privateer host; cluster provision validates that state but does not write cluster.yaml or hosts.enc.yaml. See Cluster Manifest: clusters for the manifest format.

Via admin command (ad-hoc cluster creation):

Terminal window
frameworks admin clusters create \
--cluster-id eu-prod \
--cluster-name "EU Production" \
--cluster-type edge \
--base-url eu.frameworks.network \
--foghorn-count 2 \
--deployment-model shared

This:

  1. Creates the cluster record in Quartermaster
  2. Claims N Foghorn instances from the pool (--foghorn-count)

Navigator-managed DNS and certificates are driven separately from node/service inventory and certificate status, not directly by admin clusters create.

Cluster-scoped media services such as Foghorn, Chandler, and Livepeer gateway are assigned from shared pools to logical media clusters.

Terminal window
# View pool status (shows per-cluster grouping)
frameworks admin service-pool status --service-type foghorn
# Assign idle instances to a cluster
frameworks admin service-pool assign --service-type foghorn --cluster-id <CLUSTER_ID> --count 2
# Release an assigned instance back to the pool
frameworks admin service-pool add --service-type livepeer-gateway --instance-id <UUID>
# Drain an instance from its cluster (graceful)
frameworks admin service-pool drain --service-type chandler --instance-id <UUID>
Terminal window
frameworks admin clusters cert-status --cluster-id <ID>

Shows cluster health including Foghorn HA status, edge count, and wildcard certificate readiness.


Foghorn instances discover peers via Quartermaster’s ListPeers RPC. The peer manager maintains long-lived PeerChannel connections for known peers, refreshes them periodically, and also accepts demand-driven peer hints from stream validation so new routing relationships do not wait for the next refresh.

  1. PeerChannel: Bidirectional gRPC streams between Foghorn peers exchange edge telemetry, stream advertisements, cluster summaries, heartbeats, and replication/artifact events on their own intervals
  2. QueryStream: When a viewer’s cluster doesn’t have the stream, Foghorn asks the origin cluster for edge candidates
  3. Origin-Pull: If a local edge has capacity, Foghorn arranges a DTSC pull from the remote origin. Subsequent viewers are served locally
  4. Redirect: If no local capacity, the viewer is redirected (307) to the remote cluster’s edge

Each cluster elects one Foghorn instance (via Redis SET NX, 15s TTL) to run PeerChannel connections. This prevents duplicate peer traffic in HA deployments.

For architecture details, see docs/architecture/federation.md and docs/architecture/stream-replication-topology.md.

When a tenant deletes a clip, stops a DVR, or removes a VOD asset, Commodore routes the command to the cluster that owns the artifact — not necessarily the tenant’s primary cluster. This uses a push+forward model:

  1. Push: Commodore tracks which cluster ingested each stream (active_ingest_cluster_id). Artifact operations read origin_cluster_id from the business registry and route directly to that cluster’s Foghorn.

  2. Forward (safety net): If the command arrives at a Foghorn that doesn’t own the artifact (stale routing data, race condition), Foghorn forwards it to federation peers via ForwardArtifactCommand. The first peer that owns the artifact handles it.

Tenant suspension (TerminateTenantStreams, InvalidateTenantCache) fans out to all clusters the tenant has access to, ensuring streams are terminated and caches invalidated everywhere.


The marketplace allows third-party operators to publish clusters that other tenants can subscribe to.

Clusters can be configured with different visibility levels:

VisibilityWho Can SeeAccess
privateOnly the owner tenantDirect access or invite token for another tenant
unlistedDirect-link/invited usersVia cluster invite token
publicAll tenantsVia subscription request
Tenant discovers cluster (marketplace UI or invite link)
→ RequestClusterSubscription (GraphQL mutation)
→ Cluster owner approves/rejects (ApproveClusterSubscription / RejectClusterSubscription)
→ On approval: Quartermaster activates the tenant_cluster_access record
→ Paid cluster checkout/subscription records are created by Purser's cluster-subscription flow
→ Tenant can now route streams through the cluster

Cluster operators can invite specific tenants:

Terminal window
# Creates an invite token with optional expiry
frameworks admin clusters invites create \
--cluster-id <ID> \
--owner-tenant-id <OWNER_TENANT_UUID> \
--invited-tenant-id <INVITED_TENANT_UUID> \
--expires-in-days 7

The invited tenant sees the cluster in their dashboard and can accept with one click.


The webapp provides dedicated infrastructure pages for multi-cluster operations:

PageRoutePurpose
Infrastructure Overview/infrastructureTenant info, platform performance (CPU/memory/nodes), service health summary, clickable cluster cards
Cluster Detail/infrastructure/[clusterId]Per-cluster metrics, node cards with live health, service instances, health checks
Node Detail/nodes/[id]Per-node CPU/memory/disk/streams, 5-minute performance history, service instances
Clusters/infrastructure/clustersMerged view — “My Clusters” tab (subscriptions, invitations, approvals, private cluster creation) and “Marketplace” tab (browse and connect to clusters)
Federation/infrastructure/federationFederation overview — topology map with peering/traffic relationships, traffic matrix, event type breakdown, recent federation events
Audience Analytics/analytics/audienceRouting map with cross-cluster flow visualization (amber lines for cross-cluster routes), routing events tagged with local/cross-cluster badges

Edge nodes enroll into a specific cluster using bootstrap tokens.

Via Dashboard (tenant owner):

  1. Go to Infrastructure -> Clusters
  2. Click Create Private Cluster or select an existing cluster
  3. Copy the bootstrap token

Via CLI (admin):

Terminal window
frameworks admin bootstrap-tokens create \
--kind edge_node \
--tenant-id <TENANT_UUID> \
--cluster-id <CLUSTER_ID> \
--name "edge-node-1" \
--ttl 24h
Terminal window
frameworks edge provision \
--enrollment-token enroll_xxx \

The CLI sends the token to Bridge via the public bootstrapEdge mutation. Bridge validates the token via Quartermaster, finds the cluster’s assigned Foghorn, and proxies PreRegisterEdge. The response carries:

  • Assigned node ID and edge domain ({node_label}.{cluster_slug}.{base}, where node_label is the node ID with a single edge- prefix)
  • Pool domain for the cluster
  • Foghorn gRPC address the edge will use at runtime for Helmsman’s control stream
  • Internal CA bundle for initial gRPC trust bootstrap

The operator never needs to know cluster topology — the bootstrap token IS the cluster selector. After bootstrap, the edge talks directly to its assigned Foghorn over the public internet; it never joins the WireGuard mesh.

If you need to dial a specific Foghorn directly (for debugging), pass --foghorn-addr and the CLI bypasses Bridge for that one call.


Each cluster gets its own set of DNS records under {cluster_slug}.{base_domain}:

RecordPurpose
edge-ingest.{slug}.{base}RTMP/E-RTMP/SRT/WHIP ingest
edge.{slug}.{base}Edge pool (any edge in cluster)
foghorn.{slug}.{base}Viewer routing / playback resolution
livepeer.{slug}.{base}Livepeer gateway / transcoding endpoint
{node_label}.{slug}.{base}Per-edge A records for direct addressing (node_label is the node ID with a single edge- prefix)

Navigator manages these records automatically. Wildcard TLS certificates (*.{slug}.{base}) are issued via ACME DNS-01 and distributed to edges via ConfigSeed.

See DNS and Cluster Routing for details.


Stream/viewer/artifact analytics carry cluster_id (serving cluster) and origin_cluster_id (where the stream was ingested). The billing rollups use those fields when they are present and attribute non-cluster-scoped usage, such as storage, processing, and API usage, to the tenant’s primary cluster. This enables per-cluster billing:

  • Periscope Query generates per-cluster UsageSummary records
  • Purser stores these as per-cluster usage records
  • Settlement queries can identify cross-cluster traffic for infrastructure cost attribution

Inter-cluster DTSC bandwidth is infrastructure cost, not a tenant-facing billing item.

Each cluster has a pricing model configured in purser.cluster_pricing:

Pricing ModelDescription
free_unmeteredNo metering (community tier)
meteredPay-as-you-go resource billing
monthlyFixed monthly subscription
tier_inheritInherit pricing from the tenant’s billing tier
customOperator-defined pricing

See docs/architecture/cross-cluster-billing.md and docs/architecture/billing-tier-provisioning.md for the full attribution and provisioning model.


Monitor peer connectivity and replication state:

  • PeerChannel status: Each Foghorn logs peer connections and heartbeat latency
  • Stream advertisements: Exchanged every 5 seconds via PeerChannel; stale advertisements indicate peer issues
  • Active replications: In-flight cross-cluster DTSC pulls tracked in Redis (5-min TTL)
  • Federation events: All cross-cluster operations (peering, replication, artifact access, redirects) emit geo-enriched events to ClickHouse (federation_events table). View them on the Federation dashboard page (/infrastructure/federation)
  • PeerHeartbeat geo exchange: Foghorn peers exchange their geographic coordinates (resolved from infrastructure_nodes.external_ip at bootstrap). This enables topology map visualization with real geographic positions

ClickHouse tables partition by cluster_id, enabling per-cluster dashboards for:

  • Viewer hours and egress
  • Stream health and QoE
  • Edge node utilization
  • Cross-cluster replication events
  • Federation event geo data (local/remote lat/lon for topology visualization)