Multi-Cluster Operations
What Is a Cluster?
Section titled “What Is a Cluster?”A cluster is a Foghorn instance (or HA pair) plus its edge nodes — strictly the media plane. All clusters share the same central control plane (Commodore, Quartermaster, Purser) and data plane (Decklog, Periscope, Signalman). These services are not duplicated or federated per cluster.
While control and data plane services share infrastructure, they are registered under separate cluster IDs for operational visibility. The control plane cluster appears in the inventory and dashboard but does not participate in Foghorn federation.
Cross-cluster coordination is handled exclusively by the FoghornFederation gRPC protocol between Foghorn instances.
Deployment Models
Section titled “Deployment Models”| Model | Description |
|---|---|
| Shared | Free shared capacity is currently unmetered; premium shared tiers are seeded for later enforcement. All tenants share the same cluster, edges, and services. |
| Dedicated | Enterprise tier. Isolated cluster with dedicated Foghorn, edges, and capacity — operated by FrameWorks or self-hosted on-premise via the CLI, with Manual Deployment as an advanced reference. |
| Hybrid | Tenant runs self-hosted edges that fall back to the primary shared cluster. Edges federate via Foghorn; control/data plane remains shared. |
| Open Marketplace | Operators publish listed clusters with access controls, invites, subscription requests, and pricing metadata. A vetted public marketplace program for third-party capacity is on the roadmap. |
Cluster Lifecycle
Section titled “Cluster Lifecycle”Creating a Cluster
Section titled “Creating a Cluster”Via CLI provisioning (recommended for full deployments):
# Generate and validate GitOps-owned WireGuard identity from a local checkout.frameworks mesh wg generate --manifest gitops/clusters/production/cluster.yamlframeworks mesh wg check --gitops-dir gitops --cluster production
# Provisioning is read-only against GitOps.frameworks cluster provision --gitops-dir gitops --cluster productionWhen the manifest has a clusters section, the CLI creates all clusters, registers nodes to the correct cluster based on role-based resolution, and generates per-cluster enrollment tokens. The manifest must already contain complete WireGuard identity for every managed Privateer host; cluster provision validates that state but does not write cluster.yaml or hosts.enc.yaml. See Cluster Manifest: clusters for the manifest format.
Via admin command (ad-hoc cluster creation):
frameworks admin clusters create \ --cluster-id eu-prod \ --cluster-name "EU Production" \ --cluster-type edge \ --base-url eu.frameworks.network \ --foghorn-count 2 \ --deployment-model sharedThis:
- Creates the cluster record in Quartermaster
- Claims N Foghorn instances from the pool (
--foghorn-count)
Navigator-managed DNS and certificates are driven separately from node/service
inventory and certificate status, not directly by admin clusters create.
Service Pool
Section titled “Service Pool”Cluster-scoped media services such as Foghorn, Chandler, and Livepeer gateway are assigned from shared pools to logical media clusters.
# View pool status (shows per-cluster grouping)frameworks admin service-pool status --service-type foghorn
# Assign idle instances to a clusterframeworks admin service-pool assign --service-type foghorn --cluster-id <CLUSTER_ID> --count 2
# Release an assigned instance back to the poolframeworks admin service-pool add --service-type livepeer-gateway --instance-id <UUID>
# Drain an instance from its cluster (graceful)frameworks admin service-pool drain --service-type chandler --instance-id <UUID>Cluster Health
Section titled “Cluster Health”frameworks admin clusters cert-status --cluster-id <ID>Shows cluster health including Foghorn HA status, edge count, and wildcard certificate readiness.
Federation
Section titled “Federation”Foghorn instances discover peers via Quartermaster’s ListPeers RPC. The peer manager maintains long-lived PeerChannel connections for known peers, refreshes them periodically, and also accepts demand-driven peer hints from stream validation so new routing relationships do not wait for the next refresh.
How It Works
Section titled “How It Works”- PeerChannel: Bidirectional gRPC streams between Foghorn peers exchange edge telemetry, stream advertisements, cluster summaries, heartbeats, and replication/artifact events on their own intervals
- QueryStream: When a viewer’s cluster doesn’t have the stream, Foghorn asks the origin cluster for edge candidates
- Origin-Pull: If a local edge has capacity, Foghorn arranges a DTSC pull from the remote origin. Subsequent viewers are served locally
- Redirect: If no local capacity, the viewer is redirected (307) to the remote cluster’s edge
Leader-Only Peering
Section titled “Leader-Only Peering”Each cluster elects one Foghorn instance (via Redis SET NX, 15s TTL) to run PeerChannel connections. This prevents duplicate peer traffic in HA deployments.
For architecture details, see docs/architecture/federation.md and docs/architecture/stream-replication-topology.md.
Artifact Command Routing
Section titled “Artifact Command Routing”When a tenant deletes a clip, stops a DVR, or removes a VOD asset, Commodore routes the command to the cluster that owns the artifact — not necessarily the tenant’s primary cluster. This uses a push+forward model:
-
Push: Commodore tracks which cluster ingested each stream (
active_ingest_cluster_id). Artifact operations readorigin_cluster_idfrom the business registry and route directly to that cluster’s Foghorn. -
Forward (safety net): If the command arrives at a Foghorn that doesn’t own the artifact (stale routing data, race condition), Foghorn forwards it to federation peers via
ForwardArtifactCommand. The first peer that owns the artifact handles it.
Tenant suspension (TerminateTenantStreams, InvalidateTenantCache) fans out
to all clusters the tenant has access to, ensuring streams are terminated and
caches invalidated everywhere.
Marketplace
Section titled “Marketplace”The marketplace allows third-party operators to publish clusters that other tenants can subscribe to.
Cluster Visibility
Section titled “Cluster Visibility”Clusters can be configured with different visibility levels:
| Visibility | Who Can See | Access |
|---|---|---|
private | Only the owner tenant | Direct access or invite token for another tenant |
unlisted | Direct-link/invited users | Via cluster invite token |
public | All tenants | Via subscription request |
Subscription Lifecycle
Section titled “Subscription Lifecycle”Tenant discovers cluster (marketplace UI or invite link) → RequestClusterSubscription (GraphQL mutation) → Cluster owner approves/rejects (ApproveClusterSubscription / RejectClusterSubscription) → On approval: Quartermaster activates the tenant_cluster_access record → Paid cluster checkout/subscription records are created by Purser's cluster-subscription flow → Tenant can now route streams through the clusterInvitation Flow
Section titled “Invitation Flow”Cluster operators can invite specific tenants:
# Creates an invite token with optional expiryframeworks admin clusters invites create \ --cluster-id <ID> \ --owner-tenant-id <OWNER_TENANT_UUID> \ --invited-tenant-id <INVITED_TENANT_UUID> \ --expires-in-days 7The invited tenant sees the cluster in their dashboard and can accept with one click.
Dashboard Pages
Section titled “Dashboard Pages”The webapp provides dedicated infrastructure pages for multi-cluster operations:
| Page | Route | Purpose |
|---|---|---|
| Infrastructure Overview | /infrastructure | Tenant info, platform performance (CPU/memory/nodes), service health summary, clickable cluster cards |
| Cluster Detail | /infrastructure/[clusterId] | Per-cluster metrics, node cards with live health, service instances, health checks |
| Node Detail | /nodes/[id] | Per-node CPU/memory/disk/streams, 5-minute performance history, service instances |
| Clusters | /infrastructure/clusters | Merged view — “My Clusters” tab (subscriptions, invitations, approvals, private cluster creation) and “Marketplace” tab (browse and connect to clusters) |
| Federation | /infrastructure/federation | Federation overview — topology map with peering/traffic relationships, traffic matrix, event type breakdown, recent federation events |
| Audience Analytics | /analytics/audience | Routing map with cross-cluster flow visualization (amber lines for cross-cluster routes), routing events tagged with local/cross-cluster badges |
Edge Enrollment
Section titled “Edge Enrollment”Edge nodes enroll into a specific cluster using bootstrap tokens.
Token Creation
Section titled “Token Creation”Via Dashboard (tenant owner):
- Go to Infrastructure -> Clusters
- Click Create Private Cluster or select an existing cluster
- Copy the bootstrap token
Via CLI (admin):
frameworks admin bootstrap-tokens create \ --kind edge_node \ --tenant-id <TENANT_UUID> \ --cluster-id <CLUSTER_ID> \ --name "edge-node-1" \ --ttl 24hProvisioning with Token
Section titled “Provisioning with Token”frameworks edge provision \ --enrollment-token enroll_xxx \The CLI sends the token to Bridge via the public bootstrapEdge mutation. Bridge validates the token via Quartermaster, finds the cluster’s assigned Foghorn, and proxies PreRegisterEdge. The response carries:
- Assigned node ID and edge domain (
{node_label}.{cluster_slug}.{base}, wherenode_labelis the node ID with a singleedge-prefix) - Pool domain for the cluster
- Foghorn gRPC address the edge will use at runtime for Helmsman’s control stream
- Internal CA bundle for initial gRPC trust bootstrap
The operator never needs to know cluster topology — the bootstrap token IS the cluster selector. After bootstrap, the edge talks directly to its assigned Foghorn over the public internet; it never joins the WireGuard mesh.
If you need to dial a specific Foghorn directly (for debugging), pass --foghorn-addr and the CLI bypasses Bridge for that one call.
Each cluster gets its own set of DNS records under {cluster_slug}.{base_domain}:
| Record | Purpose |
|---|---|
edge-ingest.{slug}.{base} | RTMP/E-RTMP/SRT/WHIP ingest |
edge.{slug}.{base} | Edge pool (any edge in cluster) |
foghorn.{slug}.{base} | Viewer routing / playback resolution |
livepeer.{slug}.{base} | Livepeer gateway / transcoding endpoint |
{node_label}.{slug}.{base} | Per-edge A records for direct addressing (node_label is the node ID with a single edge- prefix) |
Navigator manages these records automatically. Wildcard TLS certificates (*.{slug}.{base}) are issued via ACME DNS-01 and distributed to edges via ConfigSeed.
See DNS and Cluster Routing for details.
Billing Attribution
Section titled “Billing Attribution”Stream/viewer/artifact analytics carry cluster_id (serving cluster) and origin_cluster_id (where the stream was ingested). The billing rollups use those fields when they are present and attribute non-cluster-scoped usage, such as storage, processing, and API usage, to the tenant’s primary cluster. This enables per-cluster billing:
- Periscope Query generates per-cluster
UsageSummaryrecords - Purser stores these as per-cluster usage records
- Settlement queries can identify cross-cluster traffic for infrastructure cost attribution
Inter-cluster DTSC bandwidth is infrastructure cost, not a tenant-facing billing item.
Cluster Pricing
Section titled “Cluster Pricing”Each cluster has a pricing model configured in purser.cluster_pricing:
| Pricing Model | Description |
|---|---|
free_unmetered | No metering (community tier) |
metered | Pay-as-you-go resource billing |
monthly | Fixed monthly subscription |
tier_inherit | Inherit pricing from the tenant’s billing tier |
custom | Operator-defined pricing |
See docs/architecture/cross-cluster-billing.md and docs/architecture/billing-tier-provisioning.md for the full attribution and provisioning model.
Monitoring
Section titled “Monitoring”Federation Health
Section titled “Federation Health”Monitor peer connectivity and replication state:
- PeerChannel status: Each Foghorn logs peer connections and heartbeat latency
- Stream advertisements: Exchanged every 5 seconds via PeerChannel; stale advertisements indicate peer issues
- Active replications: In-flight cross-cluster DTSC pulls tracked in Redis (5-min TTL)
- Federation events: All cross-cluster operations (peering, replication, artifact access, redirects) emit geo-enriched events to ClickHouse (
federation_eventstable). View them on the Federation dashboard page (/infrastructure/federation) - PeerHeartbeat geo exchange: Foghorn peers exchange their geographic coordinates (resolved from
infrastructure_nodes.external_ipat bootstrap). This enables topology map visualization with real geographic positions
Per-Cluster Metrics
Section titled “Per-Cluster Metrics”ClickHouse tables partition by cluster_id, enabling per-cluster dashboards for:
- Viewer hours and egress
- Stream health and QoE
- Edge node utilization
- Cross-cluster replication events
- Federation event geo data (local/remote lat/lon for topology visualization)