Skip to content

Skipper Setup

Skipper requires three things: an LLM for chat, an embedding model for the knowledge base, and PostgreSQL with pgvector. Optionally, add a web search provider for live lookups beyond the knowledge base.

All configuration is through environment variables. The LLM and embedding providers are independent — you can mix local embeddings with an API-hosted LLM.

Embedding models convert text into vectors for similarity search. Quality here affects every answer — bad retrieval poisons everything downstream.

OptionRun whereNotes
nomic-embed-textOllama / local CPU or GPUStrong default for local deployments.
bge-m3Ollama / local CPU or GPUMulti-lingual, multi-granularity. Heavier but excellent.
Hosted embedding APIOpenAI-compatible providerSimple operations path; verify current provider pricing.
Blueclaw / Livepeer-network AIOpenAI-compatible or BYOC runtimeGood fit when you want network-provided inference/embeddings instead of separate SaaS APIs.

Recommendation: Start with local or network-provided embeddings if available. If retrieval quality is insufficient, try a higher-quality hosted embedding model and re-crawl.

Self-hosted via Ollama. No per-token cost — you pay for hardware.

Tier 1 — Lightweight (7-8B) — single 8-16GB GPU

ModelContextVRAM (Q4)Notes
Qwen 3 8B128K~5GBStrong instruction following + tool use
Llama 3.1 8B128K~5GBBattle-tested, huge community
Mistral 7B v0.332K~4.5GBFast, good at structured prompts

Good for simple lookups. Struggles with multi-hop reasoning across multiple chunks.

Tier 2 — Sweet Spot (14-32B) — single 24-48GB GPU (recommended)

ModelContextVRAM (Q4)Notes
Qwen 2.5 32B128K~20GBBest all-around. Strong RAG + instruction following.
Qwen 3 32B128K~20GBBuilt-in thinking mode. Good for reasoning.
Mistral Small 3.1 24B128K~15GBExcellent function calling. Fits on a 24GB card.
Command R 35B128K~22GBPurpose-built for RAG by Cohere. Native citations.

RTX 3090/4090 (24GB) runs Q4_K_M of 32B models at ~15-25 tok/s — fine for chat.

Tier 3 — Heavy (70B+) — 2x 24GB or 1x 80GB GPU

ModelContextVRAM (Q4)Notes
Llama 3.3 70B128K~40GBNear 405B quality.
Qwen 2.5 72B128K~42GBExcellent multilingual + code.

The jump from 32B to 70B is ~10-15% better on hard queries. Usually not worth 2-3x the hardware for a domain consultant with good RAG.

No local hardware needed. Skipper uses OpenAI-compatible chat and embedding endpoints, so hosted SaaS, Blueclaw, Livepeer-network inference, or a BYOC model container can all fit the same configuration shape when they expose compatible APIs.

For Livepeer-network deployments, check the current Livepeer AI docs and ask in the Livepeer Discord before planning around a custom model or BYOC container. Availability changes as model runners and orchestrator support evolve. BYOC is useful when the model you want is not exposed by a managed endpoint yet.

OptionBest ForNotes
Hosted SaaS model APIFastest path to productionVerify current pricing and data policy directly with the provider.
BlueclawOpenAI-compatible gateway for agentsCan reduce vendor lock-in if it has the models you need.
Livepeer-network inferenceKeeping inference spend and usage on-networkGood strategic fit when available for your target models.
Livepeer BYOC containerCustom models or runtime controlMore operator work, but lets you bring models not exposed by default.

Skipper cost depends on:

  • conversation history included in the prompt
  • retrieved knowledge chunks and citations
  • tool-call count and tool result size
  • whether query rewriting, HyDE, reranking, or web search are enabled
  • provider pricing, model choice, and whether inference runs locally or on-network

Early planning shape:

Deployment shapeCost profileWhen to choose it
Local OllamaHardware and operations, no per-token API billSelf-hosted clusters with spare GPU capacity.
Blueclaw / Livepeer-network AINetwork/provider pricing, less separate SaaS usageAgents-first deployments and on-network usage goals.
Budget hosted SaaS modelUsually low at modest chat volumeFast setup when answer quality requirements are modest.
Higher-quality hosted SaaS modelCan climb quickly with long prompts/tool resultsHarder support cases or premium account tiers.

For budgeting, use Skipper’s recorded tokensInput, tokensOutput, tool calls, and provider/model fields as the source of truth. Real deployments should compare that usage data with current provider pricing or local inference costs before committing to a support-tier margin.


LLM:

VariablePurposeDefault
LLM_PROVIDERopenai, anthropic, or ollama
LLM_MODELModel identifier
LLM_API_KEYAPI credentials
LLM_API_URLCustom endpoint (OpenRouter, self-hosted)Provider default
LLM_MAX_TOKENSMax output tokens per response4096

Embeddings (falls back to LLM_* when unset):

VariablePurposeDefault
EMBEDDING_PROVIDERopenai or ollamaLLM_PROVIDER
EMBEDDING_MODELEmbedding model nameLLM_MODEL
EMBEDDING_API_KEYEmbedding API credentialsLLM_API_KEY
EMBEDDING_API_URLEmbedding endpointLLM_API_URL

Utility LLM (for background tasks like contextual retrieval; falls back to LLM_* when unset):

VariablePurposeDefault
UTILITY_LLM_PROVIDERCheap LLM for background tasks (query rewriting, HyDE, contextual retrieval)LLM_PROVIDER
UTILITY_LLM_MODELUtility model identifierLLM_MODEL
UTILITY_LLM_API_KEYUtility LLM credentialsLLM_API_KEY
UTILITY_LLM_API_URLUtility LLM endpointLLM_API_URL

Web search (optional):

VariablePurposeDefault
SEARCH_PROVIDERtavily, brave, or searxng
SEARCH_API_KEYSearch API key (not needed for SearXNG)
SEARCH_API_URLCustom endpoint (required for SearXNG)Provider default

Retrieval quality:

VariablePurposeDefault
RERANKER_PROVIDERCross-encoder reranker: cohere, jina, voyage, or generic— (keyword fallback)
RERANKER_MODELReranker model (e.g. rerank-4-pro, rerank-2.5, jina-reranker-v2-base-multilingual)
RERANKER_API_KEYReranker API credentialsLLM_API_KEY
RERANKER_API_URLReranker endpoint (required for generic provider)Provider default
SKIPPER_ENABLE_HYDEEnable Hypothetical Document Embeddings for search_knowledgefalse

Knowledge base:

VariablePurposeDefault
SITEMAPSComma-separated sitemap URLs
SKIPPER_SITEMAPS_DIRDirectory of source files (re-read each cycle)
CRAWL_INTERVALRefresh interval24h
CHUNK_TOKEN_LIMITMax BPE tokens per chunk500
CHUNK_TOKEN_OVERLAPOverlap tokens between adjacent chunks50
SKIPPER_ENABLE_RENDERINGEnable headless Chrome for JS-rendered pagesfalse
SKIPPER_CONTEXTUAL_RETRIEVALUse utility LLM to prepend context before embeddingfalse
SKIPPER_LINK_DISCOVERYDiscover and crawl same-domain linksfalse
SKIPPER_SEARCH_LIMITDefault result limit for search_knowledge8

Service:

VariablePurposeDefault
SKIPPER_WEB_UIEnable standalone web UI at /true
SKIPPER_API_KEYAPI key for admin WebUI auth
SKIPPER_WEB_UI_INSECUREAllow the standalone WebUI without an API keyfalse
SKIPPER_REQUIRED_TIER_LEVELMinimum subscription tier3
SKIPPER_CHAT_RATE_LIMIT_PER_HOURRate limit per tenant0 (unlimited)
SKIPPER_CHAT_RATE_LIMIT_OVERRIDESPer-tenant overrides (tenant_id:limit,...)
SKIPPER_ADMIN_TENANT_IDTenant ID for global/platform knowledge
SKIPPER_MAX_HISTORY_MESSAGESMax conversation messages loaded per request20
GATEWAY_MCP_URLInternal Gateway MCP endpoint for platform toolsderived from Bridge mesh URL
GATEWAY_PUBLIC_URLPublic API Gateway base URL; fallback for MCP when GATEWAY_MCP_URL is unset

Social posting:

VariablePurposeDefault
SKIPPER_SOCIAL_ENABLEDEnable event-driven social posting agentfalse
SKIPPER_SOCIAL_INTERVALHow often to check for noteworthy events2h
SKIPPER_SOCIAL_MAX_PER_DAYMax posts per day (0 = unlimited)2
SKIPPER_SOCIAL_NOTIFY_EMAILEmail to send draft tweets to (required when enabled)

Zero ongoing cost. Requires a GPU for the LLM; embeddings run on CPU.

Terminal window
# LLM
LLM_PROVIDER=ollama
LLM_MODEL=qwen2.5:32b
LLM_API_URL=http://localhost:11434/v1
# Utility LLM — cheap model for background tasks (contextual retrieval)
UTILITY_LLM_PROVIDER=ollama
UTILITY_LLM_MODEL=qwen2.5:7b
UTILITY_LLM_API_URL=http://localhost:11434/v1
# Embeddings (same Ollama instance, different endpoint)
EMBEDDING_PROVIDER=ollama
EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_API_URL=http://localhost:11434
# Web search (optional, self-hosted)
SEARCH_PROVIDER=searxng
SEARCH_API_URL=http://localhost:8080
# Reranker (optional — self-hosted cross-encoder via generic provider)
# RERANKER_PROVIDER=generic
# RERANKER_MODEL=BAAI/bge-reranker-v2-m3
# RERANKER_API_URL=http://localhost:8787
# HyDE — improves search_knowledge quality at ~500-1500ms extra latency
# SKIPPER_ENABLE_HYDE=true

Pull models first:

Terminal window
ollama pull qwen2.5:32b
ollama pull qwen2.5:7b
ollama pull nomic-embed-text

Skipper’s knowledge base is populated by crawling documentation sources and embedding them into pgvector.

  1. Fetch pages from sitemaps or direct URLs 2. Detect whether the page needs headless rendering (SPA detection) 3. Extract readable text via Readability → Markdown (strips navigation, boilerplate) 4. Chunk into ~500-token segments with 50-token overlap 5. Embed each chunk via the configured embedding model 6. Store in pgvector with metadata (source URL, title, source type, ingestion timestamp)

The full ingestion pipeline handles everything from sitemap discovery through to vector storage:

graph TD
    SRC["Sitemap URLs / Direct Pages / Uploads"] --> FETCH["Fetch Sitemap XML"]
    FETCH --> VALIDATE["URL Validation<br/><small>SSRF check · DNS resolution ·<br/>private CIDR blocking</small>"]
    VALIDATE --> ROBOTS["robots.txt<br/><small>SkipperBot/1.0</small>"]
    ROBOTS --> CACHE{"Cached?<br/><small>TTL · ETag · Hash</small>"}
    CACHE -->|unchanged| SKIP[Skip]
    CACHE -->|new or changed| HTTP["HTTP Fetch"]
    HTTP --> DETECT{"SPA Detection<br/><small>score ≥ 4?</small>"}
    DETECT -->|static| EXTRACT
    DETECT -->|SPA or empty shell| RENDER["Headless Chrome<br/><small>Rod · stealth mode ·<br/>blocks images/fonts/CSS</small>"]
    RENDER --> EXTRACT["Content Extraction<br/><small>Readability → Markdown<br/>fallback: DOM walker</small>"]
    EXTRACT --> HASH{"Content Hash<br/>SHA-256"}
    HASH -->|unchanged| SKIP
    HASH -->|new| CHUNK["Chunk<br/><small>~500 tokens · 50 overlap<br/>heading-aware blocks</small>"]
    CHUNK --> CTX{"Contextual<br/>Retrieval?"}
    CTX -->|enabled| UTIL["Utility LLM<br/><small>1-2 sentence context<br/>prepended per chunk</small>"]
    UTIL --> EMBED["Embed"]
    CTX -->|disabled| EMBED
    EMBED --> STORE["pgvector<br/><small>atomic upsert per source</small>"]

The crawler runs every CRAWL_INTERVAL (default 24h) and uses three layers of change detection to avoid unnecessary work:

  • Source TTL — skip sources crawled within the interval
  • HTTP 304 — conditional fetch with ETag / If-Modified-Since
  • Content hash — SHA-256 comparison skips re-embedding unchanged pages

When rendering is enabled, Skipper also sends a HEAD request before launching Chrome — if the Content-Length matches the cached value, it skips headless rendering entirely.

Many documentation sites are JavaScript-heavy SPAs that return empty shells to a plain HTTP fetch. When SKIPPER_ENABLE_RENDERING=true, the crawler auto-detects these pages using a scoring heuristic:

  • SPA mount points (<div id="root">, <div id="app">, <div id="__next">)
  • <noscript> tags, framework markers (data-reactroot, ng-app, data-v-)
  • High script-to-text ratio, low text density in <body>

If the score reaches 4 or the extracted text has fewer than 10 words, Skipper renders the page in headless Chromium (via Rod) with stealth mode enabled and non-essential resources (images, fonts, CSS) blocked. The browser waits 500ms for DOM stability before extracting the rendered HTML.

Content is extracted using Mozilla’s Readability algorithm (converted to Markdown). If Readability returns too little text, a fallback DOM walker strips navigation, sidebars, and hidden elements.

When SKIPPER_CONTEXTUAL_RETRIEVAL=true, Skipper uses a utility LLM to prepend 1-2 sentences of context to each chunk before embedding. This helps disambiguation — a chunk about “buffer settings” gets tagged with whether it’s about OBS, FFmpeg, or MistServer. The context is used only for embedding; the original chunk text is stored for retrieval.

Three features improve retrieval accuracy at query time. All are optional and independent — enable what fits your latency and cost budget.

Cross-encoder reranking (RERANKER_PROVIDER) replaces the default keyword-overlap heuristic with a model that scores (query, chunk) pairs together. This understands semantic equivalence that keyword matching cannot — “rebuffering issues” matches “playback stalling”, “my stream keeps dying” matches “connection timeout troubleshooting”. Applied to both pre-retrieval (every message) and search_knowledge tool calls.

ProviderModelQualityLatencyCostNotes
Coherererank-4-pro#2 ELO~600ms~$2/1K queriesIncumbent, widest cloud availability (Bedrock, Azure). rerank-4-fast trades quality for speed.
Voyage AIrerank-2.5#4 ELO~600ms$0.05/1M tokens200M free tokens. Instruction-following. 1K docs/request. Backed by MongoDB.
Jinajina-reranker-v2-base-multilingual#12 ELO~750ms~$0.02/1M tokensCheapest. Strong multilingual and code search. 131K context in v3.
ZeroEntropyzerank-2#1 ELO~265ms$0.025/1M tokensBest quality + speed + price. Use generic provider.
Contextual AIrerank-v2-instruct#9 ELO~3.3s$0.05/1M tokensInstruction-following, recency-aware. Use generic provider.
GenericAny /v1/rerank endpointVariesVariesVariesSelf-hosted BGE/MXBai models, or any compatible API.

Query rewriting (automatic when utility LLM is configured) transforms conversational questions into search-optimized queries before embedding. “My European viewers are buffering a lot” becomes “European CDN edge rebuffering latency troubleshooting”. Applied to search_knowledge and search_web tool calls; skipped for pre-retrieval to keep latency low.

HyDE (SKIPPER_ENABLE_HYDE=true) generates a hypothetical answer to the user’s question, then embeds that answer instead of the question for vector search. The resulting vector is closer in embedding space to real documentation. Adds ~500-1500ms latency per search_knowledge call. Not used for pre-retrieval or web search.

Skipper ships with curated source files in config/skipper/sitemaps/:

FileContent
frameworks.txtPlatform docs and marketing site (env-templated URLs)
ecosystem.txtMistServer, Livepeer, Daydream, WebRTC, DASH, Streamplace
obs.txtOBS Studio knowledge base articles
ffmpeg.txtFFmpeg tools, codecs, formats, protocols, filters
srt.txtSRT protocol docs from Haivision
nginx-rtmp.txtnginx-rtmp-module wiki
hls-spec.txtHLS specification (IETF RFCs)

Set SKIPPER_SITEMAPS_DIR to the directory containing these files. In Docker Compose, this is mounted as a read-only volume at /etc/skipper/sitemaps.

Create a .txt file in the sitemaps directory. Two formats are supported:

Sitemap URLs — standard XML sitemaps:

https://logbook.example.com/sitemap.xml

Direct page URLs — for sites without sitemaps, prefix with page::

# Internal documentation
page:https://wiki.example.com/streaming-guide
page:https://wiki.example.com/encoder-settings

Force-rendered URLs — for SPAs that always need headless Chrome, prefix with render::

# React/Next.js docs that return empty shells without JS
render:https://spa-docs.example.com/getting-started
render:https://spa-docs.example.com/api-reference

Lines starting with # are comments. Empty lines are ignored. Environment variables are expanded (e.g., ${DOCS_PUBLIC_URL}).

The admin API accepts direct document uploads:

Terminal window
curl -X POST /api/skipper/admin/pages \
-H "Authorization: Bearer $ADMIN_TOKEN" \
-d '{"url": "internal://runbook", "title": "Streaming Runbook", "content": "..."}'

Uploaded content is embedded and stored as tenant-specific knowledge. Supported file types for multipart upload: .txt, .md, .html, .csv, .json, .xml (max 10MB).


Web search is optional — Skipper works without it, relying on the knowledge base alone. When configured, Skipper falls back to web search if the knowledge base doesn’t have a good answer.

ProviderSetupNotes
TavilySEARCH_PROVIDER=tavily + API keyReturns clean extracted content. Best for RAG.
Brave SearchSEARCH_PROVIDER=brave + API keyFast, privacy-focused. Returns snippets.
SearXNGSEARCH_PROVIDER=searxng + SEARCH_API_URLSelf-hosted, no API key needed.

Skipper can run entirely on your own infrastructure with no external API calls. This section covers deploying the self-hostable components.

Ollama runs open-weight models locally. It provides both chat completion and embedding endpoints that Skipper uses directly.

Docker (recommended for production):

ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ollama_data:/root/.ollama # persist downloaded models
deploy:
resources:
reservations:
devices:
- driver: nvidia # GPU passthrough (NVIDIA)
count: all
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s

For AMD GPUs, use image: ollama/ollama:rocm. For CPU-only, omit the deploy.resources block — expect ~2-5 tok/s on a 32B model.

Pull models after starting Ollama:

Terminal window
# Chat model (pick one per tier — see Model Selection above)
docker exec ollama ollama pull qwen2.5:32b
# Embedding model
docker exec ollama ollama pull nomic-embed-text
# Utility model (optional, for contextual retrieval / query rewriting)
docker exec ollama ollama pull qwen2.5:7b

Skipper env vars for Ollama:

Terminal window
LLM_PROVIDER=ollama
LLM_MODEL=qwen2.5:32b
LLM_API_URL=http://ollama:11434/v1 # use container name in Docker networks
EMBEDDING_PROVIDER=ollama
EMBEDDING_MODEL=nomic-embed-text
EMBEDDING_API_URL=http://ollama:11434 # embeddings use /api/embeddings, not /v1

SearXNG is a self-hosted metasearch engine. No API key needed.

Terminal window
docker run -d --name searxng -p 8080:8080 searxng/searxng:latest

Skipper env vars:

Terminal window
SEARCH_PROVIDER=searxng
SEARCH_API_URL=http://searxng:8080

For fully local retrieval quality, run a cross-encoder model via Text Embeddings Inference (TEI) or a similar /v1/rerank-compatible server:

Terminal window
docker run -d --name reranker -p 8787:80 \
ghcr.io/huggingface/text-embeddings-inference:latest \
--model-id BAAI/bge-reranker-v2-m3

Skipper env vars:

Terminal window
RERANKER_PROVIDER=generic
RERANKER_MODEL=BAAI/bge-reranker-v2-m3
RERANKER_API_URL=http://reranker:8787

A fully self-hosted stack with proper health checks and startup ordering:

services:
ollama:
image: ollama/ollama:latest
volumes:
- ollama_data:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
interval: 15s
timeout: 5s
retries: 5
start_period: 30s
searxng:
image: searxng/searxng:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
interval: 10s
timeout: 5s
retries: 3
skipper:
# ... your existing Skipper config ...
environment:
LLM_PROVIDER: ollama
LLM_MODEL: qwen2.5:32b
LLM_API_URL: http://ollama:11434/v1
EMBEDDING_PROVIDER: ollama
EMBEDDING_MODEL: nomic-embed-text
EMBEDDING_API_URL: http://ollama:11434
SEARCH_PROVIDER: searxng
SEARCH_API_URL: http://searxng:8080
depends_on:
ollama:
condition: service_healthy
searxng:
condition: service_healthy
volumes:
ollama_data:

graph TD
    Q["User Query"] --> PRE["Pre-retrieval<br/><small>auto · every message · fast path</small>"]
    Q --> TOOL["search_knowledge<br/><small>explicit LLM tool call</small>"]
    Q --> WEB["search_web<br/><small>explicit LLM tool call</small>"]

    PRE --> EMB1["Embed Query"]
    EMB1 --> HYB1["Hybrid Search"]
    HYB1 --> RERANK1["Cross-Encoder Rerank"]
    RERANK1 --> DEDUP1["Deduplicate"]
    DEDUP1 --> LLM

    TOOL --> REWRITE["Query Rewrite<br/><small>utility LLM</small>"]
    REWRITE --> HYDE{"HyDE<br/>enabled?"}
    HYDE -->|yes| HYPO["Generate Hypothetical Answer<br/><small>utility LLM → embed</small>"]
    HYDE -->|no| EMB2["Embed Rewritten Query"]
    HYPO --> HYB2["Hybrid Search"]
    EMB2 --> HYB2
    HYB2 --> RERANK2["Cross-Encoder Rerank"]
    RERANK2 --> DEDUP2["Deduplicate"]
    DEDUP2 --> LLM

    WEB --> REWRITE2["Query Rewrite<br/><small>utility LLM</small>"]
    REWRITE2 --> SEARCH["Web Search Provider"]
    SEARCH --> LLM

    LLM["LLM + MCP Tools"] --> CONF["Confidence Tagging"]
    CONF --> RESP["Response with Citations"]

Skipper runs as the api_consultant service (ports 18018 HTTP, 19007 gRPC). It connects to the API Gateway via MCP to access platform tools (diagnostics, stream management, GraphQL). The Gateway proxies Skipper’s ask_consultant tool to external MCP agents, which runs the full orchestrator pipeline internally.

Every user message triggers an automatic pre-retrieval pass that searches both tenant-specific and global knowledge. The LLM can also explicitly call search_knowledge for targeted lookups.

Search uses a hybrid approach: 70% cosine vector similarity + 30% PostgreSQL full-text ranking. Results are then reranked — when a cross-encoder is configured (RERANKER_PROVIDER), it scores (query, chunk) pairs together for semantic understanding; otherwise a keyword-overlap heuristic is used (0.7 × vector similarity + 0.3 × query term overlap). Results are deduplicated to a maximum of 2 chunks per source URL.

Query rewriting (requires utility LLM) transforms conversational questions into search-optimized queries before embedding. This bridges vocabulary gaps between how users phrase questions and how documentation is written. Applied to search_knowledge and search_web tool calls; skipped for pre-retrieval to keep latency low.

HyDE (SKIPPER_ENABLE_HYDE=true) generates a hypothetical answer via the utility LLM, then embeds that answer instead of the question for vector search. The resulting vector is closer in embedding space to real documentation. Adds ~500-1500ms latency per search_knowledge call.

DependencyPurpose
PostgreSQL + pgvectorVector store, conversations, usage tracking
LLM providerChat completion (OpenAI, Anthropic, or Ollama)
Embedding providerDocument/query embedding (OpenAI or Ollama)
API Gateway (MCP)Platform tools — diagnostics, stream CRUD, GraphQL
Periscope (gRPC)Stream health metrics (via Gateway)
Commodore (gRPC)Tenant and stream context

Skipper periodically analyzes the health of active streams and infrastructure. The heartbeat agent runs every HEARTBEAT_INTERVAL (default 30 minutes) and processes each eligible tenant.

For each tenant with active streams and a qualifying billing tier:

  1. Snapshot — fetches stream health and client QoE metrics from Periscope for the last 15 minutes
  2. Baseline comparison — compares current metrics against Welford running averages that the heartbeat has been building over time. Deviations are detected when a metric exceeds 2 standard deviations from the mean (requires at least 5 samples to avoid false positives during warmup).
  3. Triage — a deterministic decision cascade (no LLM calls):
    • Hard threshold violation → investigate
    • Cross-metric correlation with ≥ 50% confidence → investigate
    • Baseline deviations → flag for review
    • Everything normal → skip
  4. Per-stream drill-down — when something looks wrong, Skipper fetches per-stream metrics, compares each against the tenant-wide baseline, and identifies the most anomalous streams (up to 20).
  5. Investigation — only when the triage result is “investigate”, Skipper runs the chat orchestrator with the full diagnostic context (deviations, correlations, per-stream anomalies, raw metrics). This is the only step that uses LLM tokens.
  6. Notification — investigation reports and flag summaries are dispatched via email, WebSocket, or MCP.

Healthy tenants consume zero LLM calls per heartbeat cycle.

The diagnostics engine matches deviation patterns against 5 known failure hypotheses: network degradation, encoder overload, viewer-side issues, ingest instability, and CDN pressure. Each hypothesis has expected signal patterns (e.g., network degradation = packet_loss↑ + bandwidth_in↓ + buffer_health↓). Confidence is calculated as matched signals / total expected signals.

Independently of per-tenant stream health, the heartbeat checks node-level metrics across all active clusters:

  • CPU ≥ 95% and memory ≥ 95% require the violation to persist across 3 of 4 five-minute windows before alerting (prevents transient spikes)
  • Disk ≥ 90% triggers a warning; ≥ 95% is critical (fires immediately since disk doesn’t self-heal)
  • Alerts are emailed to the cluster owner with a 4-hour cooldown per node/alert type

Heartbeat:

VariablePurposeDefault
HEARTBEAT_INTERVALHow often to run the heartbeat cycle30m
SKIPPER_REQUIRED_TIER_LEVELMinimum billing tier for heartbeat processing3

Notifications (email):

VariablePurposeDefault
SMTP_HOSTSMTP server hostname
SMTP_PORTSMTP server port587
SMTP_FROMSender email address
SMTP_USERNAMESMTP authentication username
SMTP_PASSWORDSMTP authentication password

When enabled, Skipper drafts social media posts based on noteworthy platform events. Posts are sent as drafts to a configured email address for human review — nothing is auto-published.

The social agent checks for noteworthy events every SKIPPER_SOCIAL_INTERVAL (default 2 hours):

  1. Event collection — the heartbeat agent and knowledge crawler push signals into a collector as they run (platform stats, federation metrics, newly embedded pages)
  2. Detection — the detector classifies signals, compares against stored baselines, and scores them:
    • Platform stats: new viewer record, bandwidth milestone (1/10/100/1000 Gbps), significant viewer surge (>25% growth)
    • Federation: latency improvement (>20% drop), event volume milestone
    • Knowledge: newly crawled and embedded documentation page
  3. Composition — the utility LLM drafts a tweet (max 280 characters). It receives the last 10 posts to avoid repeating themes. If the draft exceeds 280 characters, it retries once, then truncates at the nearest word boundary.
  4. Publishing — the draft is saved to the database and emailed to SKIPPER_SOCIAL_NOTIFY_EMAIL for review.

The first observation for each signal type saves a baseline instead of posting — subsequent signals are compared against it.

VariablePurposeDefault
SKIPPER_SOCIAL_ENABLEDEnable the social posting agentfalse
SKIPPER_SOCIAL_INTERVALHow often to check for noteworthy events2h
SKIPPER_SOCIAL_MAX_PER_DAYMax posts per day (0 = unlimited)2
SKIPPER_SOCIAL_NOTIFY_EMAILEmail to send draft tweets to (required when enabled)