Knowledge Management

Svantic is a self-learning system. Every task it executes feeds back into a knowledge store, so the next task benefits from what was learned before. Over time, the system builds institutional intelligence — navigation strategies, error recovery patterns, site-specific quirks — that compounds with every execution.

The Self-Learning Pipeline

Self-learning pipeline: Task → Execute → Critic → Learner → Knowledge Cards → Retrieval → back to execution

Every completed task flows through a structured learning pipeline:

Task arrives — a user or trigger sends a request to the mesh
Agents execute — the orchestrator plans and agents run tools against real systems
Critic reviews — a dedicated Critic agent evaluates the execution for quality, correctness, and completeness
Learner extracts — a dedicated Learner agent analyzes the full execution trace (every tool call, every response, every retry) and distills reusable patterns
Knowledge cards are created — the learner’s output becomes versioned, confidence-scored knowledge cards stored in the knowledge base
RLAIF feedback — future executions that use a card report success or failure, adjusting the card’s confidence score (Laplace smoothing). Low-confidence or stale cards are automatically swept
Retrieval injects context — when the next task arrives, the system performs semantic search against the knowledge base and injects relevant learnings into the agent’s prompt

This loop runs automatically. You don’t need to configure it — every task that completes feeds the loop.

How Learning Happens

Learning occurs at two points during every task:

At Every Turn

During execution, the system observes the full context of each tool call — the inputs, outputs, errors, retries, and the decisions the orchestrator made. When a tool fails and the agent recovers (retries with different parameters, falls back to an alternative approach), that recovery path is captured as a pattern. When a tool succeeds on the first attempt, the working parameters and sequence are noted. This turn-level observation means the system doesn’t just learn from final outcomes — it learns from the intermediate steps, including the ones that didn’t work.

At Session End

Once a task completes, the Learner agent receives the full execution trace and distills it into reusable knowledge. It identifies what worked, what failed, and what could be done differently. The output is structured into two scopes:

Scope-specific — tied to a particular target (a domain, service, API, or integration endpoint). These learnings are keyed by scope so they’re retrieved only when future tasks interact with the same target.
General — workflow-level patterns that transfer across contexts. These capture tool selection strategies, sequencing heuristics, error handling approaches, and other insights that apply regardless of the specific target.

The learner doesn’t just append — it merges. If a card already exists for the same scope, the learner compares the new observations against existing knowledge, updates what changed, and leaves the rest intact.

Knowledge Cards

Knowledge is stored as cards — the atomic unit of learning in Svantic. Each card is:

Scope-keyed — tied to a domain, workflow, or general pattern (e.g. site:ecams.geico.com, workflow:pdf-extraction)
Versioned — every update increments the version, so you can track how knowledge evolves
Confidence-scored — starts at a neutral score and adjusts with outcomes using Laplace smoothing: confidence = (successes + 1) / (successes + failures + 2)
Merge-aware — when the learner runs, it fetches existing cards for the same scope. If nothing changed, it emits UNCHANGED and skips the write. If the card needs updating, it merges the new observations with existing content

Card Lifecycle

New task completes → Learner runs → Existing card fetched
    ↓
Card exists?
    No  → Create new card (version 1, confidence 0.5)
    Yes → Compare with trace
            ↓
        Content changed?
            No  → UNCHANGED (skip write)
            Yes → UPDATE (merge + increment version)
    ↓
Future tasks use the card
    ↓
Success → confidence increases
Failure → confidence decreases
    ↓
Confidence too low → card flagged for review
No updates for too long → stale sweep removes it

Outcome Feedback (RLAIF)

Every time an agent retrieves and uses a knowledge card during execution, the outcome is recorded:

Success — the task completed correctly, the card’s advice was useful → confidence increases
Failure — the task failed or the card’s advice was wrong → confidence decreases

This creates a reinforcement loop: cards that consistently help get higher confidence and appear more prominently in future retrievals. Cards that mislead get downranked or removed.

Stale Sweep

Cards that haven’t been updated or validated within a configurable TTL are automatically cleaned up. This prevents the knowledge base from accumulating outdated information — if a website redesigns its portal, the old navigation learnings naturally expire.

How Retrieval Works

When a task arrives, Svantic queries the knowledge base before the agent starts executing. The retrieval system uses several strategies:

Semantic Search

The query is converted to a vector embedding and compared against all card embeddings using cosine similarity. The top-k most relevant results are returned with relevance scores.

Site Boost

For tasks targeting a known domain, retrieval allocates dedicated slots for site-specific cards. If a task targets portal.example.com, the system ensures site-specific learnings for that domain appear even if generic workflow learnings have higher raw similarity scores.

Outcome Split

Retrieved results are separated into success and failure learnings. The agent’s prompt receives both:

“What worked” — patterns from successful past executions
“What failed” — patterns from failed attempts, so the agent avoids known pitfalls

This dual injection is more effective than only providing positive examples.

The Compounding Effect

Compounding effect: execution quality improves as knowledge accumulates over time

Knowledge compounds across tasks and contexts. Scope-specific learnings from one integration help when the system encounters similar patterns elsewhere. Error recovery strategies generalize. Sequencing heuristics transfer. Each task adds to the store, and every future task benefits from everything that came before. Early tasks explore. Later tasks execute with the accumulated intelligence of every prior execution — institutional knowledge that builds automatically, without manual curation.

How knowledge flows depends on your deployment topology:

Topology	Knowledge Scope	Sharing
Standalone	Local to the instance	Single brain, all knowledge in one place
Sidecar	Local to each pod	Each pod builds its own knowledge independently
Central + Sidecar	Shared across fleet	Sidecars contribute learnings upward; central distributes to all pods

In the Central + Sidecar topology, what Pod A learned on Monday is available to Pod C on Tuesday. This creates fleet-wide institutional intelligence — the collective experience of every agent in every pod, accessible to all.

Get Started

Build Agents

Deep Dive

Knowledge & Self-Learning

Security

Policies & Guardrails

Notifications & Alerts

Deploying Your Agents

Forge

Plugins

Knowledge store

Knowledge Management

The Self-Learning Pipeline

How Learning Happens

At Every Turn

At Session End

Knowledge Cards

Card Lifecycle

Outcome Feedback (RLAIF)

Stale Sweep

How Retrieval Works

Semantic Search

Site Boost

Outcome Split

The Compounding Effect

Get Started

Build Agents

Deep Dive

Knowledge & Self-Learning

Security

Policies & Guardrails

Notifications & Alerts

Deploying Your Agents

Forge

Plugins

Documentation Index

​Knowledge Management

​The Self-Learning Pipeline

​How Learning Happens

​At Every Turn

​At Session End

​Knowledge Cards

​Card Lifecycle

​Outcome Feedback (RLAIF)

​Stale Sweep

​How Retrieval Works

​Semantic Search

​Site Boost

​Outcome Split

​The Compounding Effect

​Deployment and Sharing

Knowledge Management

The Self-Learning Pipeline

How Learning Happens

At Every Turn

At Session End

Knowledge Cards

Card Lifecycle

Outcome Feedback (RLAIF)

Stale Sweep

How Retrieval Works

Semantic Search

Site Boost

Outcome Split

The Compounding Effect

Deployment and Sharing