<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>FrameWorks Docs | Blog</title><description/><link>https://logbook.frameworks.network/</link><language>en</language><item><title>Introducing Skipper: AI Video Consultant</title><link>https://logbook.frameworks.network/blog/skipper-launch/</link><guid isPermaLink="true">https://logbook.frameworks.network/blog/skipper-launch/</guid><description>Skipper is an AI video consultant built into FrameWorks. It searches curated docs, runs live diagnostics, and manages your streams. A smart assistant that is constantly learning and even tells you when it&apos;s guessing an answer.

</description><pubDate>Sat, 07 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Every software platform needs a comprehensive API and docs for people to build on. Modern tools like &lt;a href=&quot;https://docusaurus.io/&quot;&gt;Docusaurus&lt;/a&gt; and &lt;a href=&quot;https://starlight.astro.build/&quot;&gt;Astro Starlight&lt;/a&gt; make it easier than ever to publish high-quality documentation — write your content in Markdown, and the build spits out something that looks good, with search, themes, and translation built in.&lt;/p&gt;
&lt;p&gt;But the way people interact with docs is changing. With the rise of agentic assistants, developers increasingly point an AI at your docs and let it figure things out. What used to take two engineers two weeks — read the docs, build an integration, wire up a UI — now takes one agent a few hours to crawl, plan, and ship. The docs aren’t just for humans anymore; they’re a knowledge source for agents too.&lt;/p&gt;
&lt;p&gt;That shift changes what a platform should offer. Static docs are table stakes. What developers actually need is a consultant that understands the domain, can look things up across dozens of sources, run live diagnostics on their infrastructure, and guide or execute authenticated platform workflows where that surface allows mutations — all while being honest about what it knows versus what it’s guessing.&lt;/p&gt;
&lt;p&gt;Meet &lt;strong&gt;Skipper&lt;/strong&gt; — the AI video consultant built into FrameWorks.&lt;/p&gt;
&lt;p&gt;Skipper connects through the same &lt;a href=&quot;https://logbook.frameworks.network/agents/mcp&quot;&gt;MCP gateway&lt;/a&gt; that external agents use. The same GraphQL API, the same tool registry, the same authorization checks. In the dashboard, Skipper can use your authenticated account context for streams, diagnostics, and billing. External agents can use the Gateway MCP endpoint directly, while &lt;code dir=&quot;auto&quot;&gt;ask_consultant&lt;/code&gt; runs the Skipper pipeline with mutation tools blocked inside the consultant call.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;what-skipper-does&quot;&gt;What Skipper Does&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;That gateway access means Skipper doesn’t just answer questions — in the authenticated dashboard path, it can use platform tools for actions. In docs mode and through MCP &lt;code dir=&quot;auto&quot;&gt;ask_consultant&lt;/code&gt;, mutation tools are intentionally blocked; external agents should call dedicated Gateway MCP tools directly when they need to change resources.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Knowledge base search.&lt;/strong&gt; Skipper searches curated documentation across 10+ streaming domains: FrameWorks, MistServer, FFmpeg, OBS, SRT, HLS, nginx-rtmp, DASH, WebRTC, and more. Searches are semantic, not keyword-matched. Source-backed answers include citations back to the source material.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Live stream diagnostics.&lt;/strong&gt; Skipper connects to your running streams and analyzes real data. Rebuffering patterns, packet loss, buffer health, routing decisions, anomaly detection — it can diagnose problems that would take you 30 minutes to find manually.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Stream management.&lt;/strong&gt; In authenticated dashboard chat, Skipper can create streams, refresh keys, make clips, start DVR recording, and upload VOD assets through the same GraphQL API you’d use directly, but conversationally. Through MCP, use &lt;code dir=&quot;auto&quot;&gt;ask_consultant&lt;/code&gt; for guidance and the dedicated mutation tools for state changes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;API introspection.&lt;/strong&gt; Building an integration? Skipper reads your GraphQL schema, generates ready-to-use queries, and can wrap them in working code snippets. It introspects the schema live — so the output always matches your current API version.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Billing and payments.&lt;/strong&gt; Check your balance, initiate supported crypto top-ups, or resolve x402 payment flows from authenticated dashboard chat or the dedicated Gateway MCP billing/payment tools.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;confidence-tagging&quot;&gt;Confidence Tagging&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;Every Skipper response is tagged with a confidence level:&lt;/p&gt;

























&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Level&lt;/th&gt;&lt;th&gt;Meaning&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Verified&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Confirmed from official FrameWorks documentation or tested procedures&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Sourced&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Cited from external documentation (OBS, FFmpeg, etc.) with references&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Best guess&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Inferred from general knowledge — verify before acting on it&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Unknown&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;Couldn’t validate from available sources&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;This isn’t cosmetic. When Skipper generates a Python snippet from your schema, it knows that the GraphQL query is &lt;strong&gt;verified&lt;/strong&gt; (generated from the real schema) but the Python wrapper is &lt;strong&gt;best guess&lt;/strong&gt; (LLM-generated code). The tag tells you where to focus your review.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;tool-calls&quot;&gt;Tool Calls&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;Skipper doesn’t just generate text. It uses &lt;strong&gt;30+ MCP tools&lt;/strong&gt; organized across stream management, QoE diagnostics, billing, knowledge search, API introspection, and more. When Skipper runs a tool, you see exactly what happened:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;diagnose_rebuffering&lt;/strong&gt; — analyzed your stream’s rebuffer ratio, found the root cause&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;generate_query&lt;/strong&gt; — built a createStream mutation from your live schema&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;create_clip&lt;/strong&gt; — clipped the last 30 seconds of your broadcast&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Tool results render in specialized cards: diagnostic cards show health status with metrics and recommendations, code cards show GraphQL with copy buttons, stream cards display keys with show/hide toggles.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;heartbeat-monitoring&quot;&gt;Heartbeat Monitoring&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;Skipper can run periodic health checks on active streams for eligible tenants. Every 30 minutes by default (configurable with &lt;code dir=&quot;auto&quot;&gt;HEARTBEAT_INTERVAL&lt;/code&gt;), it fetches stream health and client QoE metrics from Periscope, compares them against Welford running baselines it builds over time, and runs a deterministic triage:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Threshold violations&lt;/strong&gt; (hard limits on rebuffer ratio, packet loss, etc.) trigger an investigation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cross-metric correlations&lt;/strong&gt; — 5 failure hypotheses (network degradation, encoder overload, viewer-side issues, ingest instability, CDN pressure) are matched against deviation patterns. Confidence ≥ 50% triggers investigation.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Baseline deviations&lt;/strong&gt; — metrics exceeding 2σ from the running mean are flagged for review&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When investigation is warranted, Skipper runs the full orchestrator with the diagnostic context — deviations, correlations, per-stream anomalies — and produces a structured report with root cause and recommendations. Healthy tenants consume zero LLM calls per cycle.&lt;/p&gt;
&lt;p&gt;Per-stream drill-down identifies the most anomalous individual streams (up to 20) by comparing each against the tenant-wide baseline.&lt;/p&gt;
&lt;p&gt;Infrastructure monitoring checks node-level CPU, memory, and disk across all clusters, with persistence confirmation to prevent transient spikes from triggering alerts. Notifications go out via configured email, WebSocket, or MCP channels.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;social-posting&quot;&gt;Social Posting&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;Skipper can draft social media posts from platform events when &lt;code dir=&quot;auto&quot;&gt;SKIPPER_SOCIAL_ENABLED=true&lt;/code&gt; and &lt;code dir=&quot;auto&quot;&gt;SKIPPER_SOCIAL_NOTIFY_EMAIL&lt;/code&gt; is configured. The heartbeat agent and knowledge crawler push signals into an event collector — viewer records, bandwidth milestones, federation metrics, newly embedded documentation pages. A detector classifies and scores these signals against stored baselines.&lt;/p&gt;
&lt;p&gt;The top-scoring signal is composed into a tweet by the utility LLM (max 280 characters, references recent posts to avoid theme repetition). Posts are saved as drafts and emailed to a configured address for human review before publishing. Configurable daily limit (default 2) and check interval (default 2h).&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;where-to-use-it&quot;&gt;Where to Use It&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;Skipper is available in three places — all backed by the same MCP gateway:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Dashboard&lt;/strong&gt; — authenticated chat at &lt;code dir=&quot;auto&quot;&gt;/skipper&lt;/code&gt; with account-scoped diagnostics and management actions where enabled.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Docs site&lt;/strong&gt; — authenticated floating widget (&lt;code dir=&quot;auto&quot;&gt;Cmd+J&lt;/code&gt;). Read-only mode: knowledge search, schema introspection, and diagnostics. No mutations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Your own agents&lt;/strong&gt; — connect Claude Code, OpenClaw, or any MCP client to the Gateway endpoint. Use &lt;code dir=&quot;auto&quot;&gt;ask_consultant&lt;/code&gt; for full-pipeline read/diagnostic answers with confidence tagging alongside the Gateway’s dedicated platform tools.&lt;/li&gt;
&lt;/ol&gt;
&lt;div&gt;&lt;h2 id=&quot;architecture&quot;&gt;Architecture&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;Skipper runs as a hub-and-spoke MCP setup. The Gateway MCP acts as the hub with 30+ tools. Skipper is a spoke that both &lt;strong&gt;consumes&lt;/strong&gt; Gateway tools (diagnostics, stream management) and &lt;strong&gt;provides&lt;/strong&gt; tools back (knowledge search, web search).&lt;/p&gt;
&lt;pre dir=&quot;ltr&quot;&gt;graph TD
    subgraph &quot;Ingestion (scheduled)&quot;
        CRAWL[&quot;Crawl Sitemaps + Direct Pages&quot;] --&gt; DETECT[&quot;SPA Detection + Headless Chrome&quot;]
        DETECT --&gt; EXTRACT[&quot;Content Extraction&amp;#x3C;br/&gt;&amp;#x3C;small&gt;Readability → Markdown&amp;#x3C;/small&gt;&quot;]
        EXTRACT --&gt; CHUNK[&quot;Chunk ~500 tokens&quot;]
        CHUNK --&gt; EMBED[&quot;Embed&quot;]
        EMBED --&gt; PG[(&quot;pgvector&quot;)]
    end
    subgraph &quot;Query Time&quot;
        Q[&quot;User Query&quot;] --&gt; EMB[&quot;Embed Query&quot;]
        EMB --&gt; SEARCH[&quot;Hybrid Search&amp;#x3C;br/&gt;&amp;#x3C;small&gt;vector + full-text&amp;#x3C;/small&gt;&quot;]
        PG -.-&gt; SEARCH
        SEARCH --&gt; RERANK[&quot;Rerank + Deduplicate&quot;]
        RERANK --&gt; LLM[&quot;LLM + 30 MCP Tools&quot;]
        LLM --&gt; CONF[&quot;Confidence Tagging&quot;]
        CONF --&gt; RESP[&quot;Response with Citations&quot;]
    end&lt;/pre&gt;
&lt;p&gt;The knowledge base is built by a scheduled crawler that indexes documentation across 10+ streaming domains. The crawler handles both static sites and JavaScript-heavy SPAs — it auto-detects pages that need rendering and processes them in headless Chrome. Content extraction uses Mozilla’s Readability algorithm to strip navigation and boilerplate, and three layers of change detection (source TTL, HTTP 304, content hashing) avoid re-embedding unchanged pages.&lt;/p&gt;
&lt;p&gt;At query time, retrieval combines vector cosine similarity with PostgreSQL full-text search, followed by a reranking pass and per-source deduplication. Web search falls back to Tavily/Brave/SearXNG for topics not covered in the knowledge base.&lt;/p&gt;
&lt;p&gt;Every conversation is persisted with token counts, confidence levels, source metadata when available, and tool call history — so you can pick up where you left off.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;try-it&quot;&gt;Try It&lt;/h2&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://logbook.frameworks.network/agents/skipper&quot;&gt;Skipper Docs&lt;/a&gt;&lt;/strong&gt; — full reference&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href=&quot;https://chartroom.frameworks.network/skipper&quot;&gt;Open Dashboard&lt;/a&gt;&lt;/strong&gt; — start chatting where Skipper is enabled for your account&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Press &lt;code dir=&quot;auto&quot;&gt;Cmd+J&lt;/code&gt;&lt;/strong&gt; on any docs page to open the widget&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;Skipper is available where it is enabled for an eligible account or deployment. Chat access is controlled by the configured tier and rate limits; standard usage billing applies for any stream operations Skipper performs on your behalf.&lt;/p&gt;</content:encoded><category>announcements</category><category>skipper</category><category>ai</category></item><item><title>What Seven AI Agents Found in Our Streaming Platform</title><link>https://logbook.frameworks.network/blog/agentic-audit-pipeline/</link><guid isPermaLink="true">https://logbook.frameworks.network/blog/agentic-audit-pipeline/</guid><description>We used a multi-agent review process to audit FrameWorks across routing, ingest, billing, analytics, auth, and more. It found real edge cases, but the useful part was not automation for its own sake: it was forcing every claim to come with evidence.

</description><pubDate>Sat, 31 Jan 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Live video systems collect awkward edge cases.&lt;/p&gt;
&lt;p&gt;A viewer connects from a location that sits on the wrong side of a geofence boundary. A DVR recording starts at the same moment a stream is shutting down. Two usage events for the same tenant arrive close enough together that billing code has to prove it is actually idempotent. None of those cases are hard to understand on their own. The hard part is that they do not stay on their own.&lt;/p&gt;
&lt;p&gt;FrameWorks is split across routing, ingest, VOD, edge orchestration, analytics, billing, auth, MCP agent access, DNS, Skipper, WebSocket routing, and tenant management. Each subsystem has tests and reviews, but the interesting failures tend to sit between them. They show up when a stream lifecycle event crosses into billing, or when a viewer-routing decision depends on data that is still warming in a cache.&lt;/p&gt;
&lt;p&gt;We wanted a way to keep looking for those failures after the initial implementation work was done. Not a one-off security audit, and not a big rewrite disguised as process. Just a repeatable way to ask: which parts of the platform have not been reviewed recently, what can go wrong there, and can we prove it from the code?&lt;/p&gt;
&lt;p&gt;So we ran a manually orchestrated audit using seven agent roles.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;how-the-audit-worked&quot;&gt;How the audit worked&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;The process was intentionally simple. One agent picked a domain to inspect. Another traced the relevant code paths and wrote findings. A separate reviewer checked those findings against the repository. If a fix made sense, another agent implemented it, a reviewer looked at the pull request, and a final pass addressed review feedback before CI and human merge.&lt;/p&gt;
&lt;p&gt;The roles were less important than the separation between them. The agent that found a bug did not get to declare the fix correct. The agent that wrote the patch did not review its own work. The human still had the last merge decision.&lt;/p&gt;
&lt;p&gt;We also made one rule non-negotiable: every finding had to cite evidence. A vague claim like “there may be a race condition here” was not enough. The report had to point to the files involved, describe the interleaving or input that triggered the issue, explain the impact, and propose a concrete fix.&lt;/p&gt;
&lt;p&gt;That made the process much more useful. It also filtered out a lot of confident nonsense.&lt;/p&gt;
&lt;aside aria-label=&quot;Evidence beats confidence&quot;&gt; &lt;p aria-hidden=&quot;true&quot;&gt; Evidence beats confidence &lt;/p&gt; &lt;div&gt; &lt;p&gt;The best audit findings were not the most dramatic ones. They were the ones that made a small,
specific claim and backed it with enough context that another reviewer could reproduce the
reasoning.&lt;/p&gt; &lt;/div&gt; &lt;/aside&gt;
&lt;div&gt;&lt;h2 id=&quot;why-agents-helped&quot;&gt;Why agents helped&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;The strongest use case was not “AI replaces engineers.” It was much narrower: agents are good at patiently tracing boring paths through a large codebase.&lt;/p&gt;
&lt;p&gt;For example, a human reviewer might look at the billing handler, confirm the obvious transaction boundary, and move on. An audit agent can keep following the event backward into Kafka consumers, forward into ClickHouse writes, sideways into tenant scoping, and then ask what happens if two messages arrive for the same tenant in the same small window.&lt;/p&gt;
&lt;p&gt;That kind of review is tedious. It is also where a lot of production bugs live.&lt;/p&gt;
&lt;p&gt;The agents were useful in three places:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Breadth.&lt;/strong&gt; They could cover many subsystems without requiring one person to keep the entire platform in their head all week.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Patience.&lt;/strong&gt; They did not mind following a stream from API request to database write to service event to async consumer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Second opinions.&lt;/strong&gt; A separate review pass caught findings that sounded plausible but did not match the actual code or design intent.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The last point mattered most. LLMs can misunderstand why a system is built a certain way. Sometimes code that looks suspicious is an intentional trade-off. Sometimes an apparent race is already prevented by a lock one layer down. The review pass forced us to separate “this looks scary” from “this is actually broken.”&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;what-we-found&quot;&gt;What we found&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;We ran the first version across 12 subsystems in 26 batches. Each task was scoped to a few hours of agent work, and the batches were small enough that a human could still review the output without drowning in it.&lt;/p&gt;
&lt;p&gt;The audit completed 90 tasks across the platform. The most common findings were exactly the kinds of things we expected in a multi-tenant streaming system:&lt;/p&gt;


















































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Category&lt;/th&gt;&lt;th&gt;Count&lt;/th&gt;&lt;th&gt;Example&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Race conditions&lt;/td&gt;&lt;td&gt;12&lt;/td&gt;&lt;td&gt;Prepaid balance deduction under concurrent Kafka messages&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Tenant isolation&lt;/td&gt;&lt;td&gt;8&lt;/td&gt;&lt;td&gt;Stream context cache keyed by &lt;code dir=&quot;auto&quot;&gt;internal_name&lt;/code&gt; only, missing &lt;code dir=&quot;auto&quot;&gt;tenant_id&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Data loss risks&lt;/td&gt;&lt;td&gt;7&lt;/td&gt;&lt;td&gt;Decklog batch flush interrupted by producer crash&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Stale cache behavior&lt;/td&gt;&lt;td&gt;6&lt;/td&gt;&lt;td&gt;GeoIP cache stampede under bursty traffic&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Protocol edge cases&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;Player protocol blacklist leading to “no playable protocol” dead-end&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Storage consistency&lt;/td&gt;&lt;td&gt;5&lt;/td&gt;&lt;td&gt;S3 upload succeeds but local delete fails, leaving duplicate artifacts&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Auth bypass vectors&lt;/td&gt;&lt;td&gt;4&lt;/td&gt;&lt;td&gt;GraphQL complexity bypass through deep nesting with small page sizes&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;DNS propagation&lt;/td&gt;&lt;td&gt;3&lt;/td&gt;&lt;td&gt;Stale DNS pointing at decommissioned nodes&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;The most valuable findings were not always the highest severity. A few were boring but important tenant-isolation checks. A few were “this is safe, and here is why” confirmations that turned implicit assumptions into documented invariants. Those are useful too, because future changes have something concrete to preserve.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;what-changed-in-the-code-review-culture&quot;&gt;What changed in the code review culture&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;The audit changed the questions we ask during review.&lt;/p&gt;
&lt;p&gt;“Does this query filter by tenant?” became a first-class check, not an afterthought. “Is this operation idempotent if Kafka retries it?” started showing up in reviews outside the audit. “What happens if this cache key collides across tenants?” became easier to ask because we had examples from real findings.&lt;/p&gt;
&lt;p&gt;That is the part of the experiment we would keep even without the agents. A good audit leaves behind sharper engineering habits.&lt;/p&gt;
&lt;p&gt;It also made us more careful about severity. Agents are tempted to overstate risk because “critical issue found” sounds more useful than “possible edge case worth testing.” The review stage pushed severity back down when the evidence did not support it. That made the high-severity findings easier to take seriously.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;what-we-would-do-differently&quot;&gt;What we would do differently&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;The first run was too verbose. Some reports included too much process, too many proposed formats, and too much architecture explanation. That is useful while designing an internal workflow, but it is not useful when the goal is to decide whether a bug is real.&lt;/p&gt;
&lt;p&gt;Next time, we would make the reports shorter:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;one paragraph for the claim&lt;/li&gt;
&lt;li&gt;exact files and functions involved&lt;/li&gt;
&lt;li&gt;the failing interleaving or input&lt;/li&gt;
&lt;li&gt;the user or tenant impact&lt;/li&gt;
&lt;li&gt;the smallest reasonable fix&lt;/li&gt;
&lt;li&gt;the test that proves the fix&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We would also keep batch sizes small. Three tasks per batch was about right. Larger batches create a review queue, and the human reviewer becomes the bottleneck. Smaller batches make the pipeline feel busy without producing enough useful coverage.&lt;/p&gt;
&lt;div&gt;&lt;h2 id=&quot;where-this-goes-next&quot;&gt;Where this goes next&lt;/h2&gt;&lt;/div&gt;
&lt;p&gt;The manual version proved that the process is worth keeping, but we do not want a pile of bespoke prompts and hand-run tasks to become part of how FrameWorks operates.&lt;/p&gt;
&lt;p&gt;The next step is lightweight automation: track when each subsystem was last audited, prioritize higher-risk areas like billing and auth, and only open work when there is a useful review to run. The goal is not to have agents constantly changing code. The goal is to keep stale, risky parts of the platform from going unexamined for too long.&lt;/p&gt;
&lt;p&gt;We already use a similar idea in Skipper’s runtime monitoring. Most checks should be quiet. They wake up, inspect the current state, and do nothing unless something actually needs attention. The audit process should behave the same way.&lt;/p&gt;
&lt;p&gt;AI agents were helpful here because they gave us more review coverage, not because they removed judgment from the system. The useful pattern was evidence first, independent review second, automation third, and human merge last.&lt;/p&gt;
&lt;p&gt;That is less flashy than “seven agents fixed our platform.” It is also much closer to how we want to build software.&lt;/p&gt;</content:encoded><category>engineering</category><category>ai</category><category>architecture</category></item><item><title>Welcome to the FrameWorks Blog</title><link>https://logbook.frameworks.network/blog/welcome/</link><guid isPermaLink="true">https://logbook.frameworks.network/blog/welcome/</guid><description>Why we&apos;re writing about video infrastructure in public — and what to expect from this blog.

</description><pubDate>Fri, 06 Dec 2024 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Most streaming platforms keep their architecture behind closed doors. You get API docs, a dashboard, and a support email. How the ingest pipeline actually works, how viewer routing decisions are made, how billing enforcement interacts with stream lifecycle — that stays internal.&lt;/p&gt;
&lt;p&gt;We think that’s backwards. If you’re trusting a platform with your live video, you should understand how it works. Not just the API surface, but the design decisions underneath.&lt;/p&gt;
&lt;p&gt;This blog is where we write about that. Expect:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Architecture deep-dives&lt;/strong&gt; — how subsystems are designed, why we made the trade-offs we did, and what we’d do differently&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Feature launches&lt;/strong&gt; — not marketing copy, but technical explanations of what shipped and how it works under the hood&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Engineering process&lt;/strong&gt; — how we test, audit, and operate a multi-tenant streaming platform&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;No release cadence, no filler. We publish when there’s something worth reading.&lt;/p&gt;</content:encoded><category>announcements</category></item></channel></rss>