Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.svantic.com/llms.txt

Use this file to discover all available pages before exploring further.

Knowledge Management

Svantic is a self-learning system. Every task it executes feeds back into a knowledge store, so the next task benefits from what was learned before. Over time, the system builds institutional intelligence — navigation strategies, error recovery patterns, site-specific quirks — that compounds with every execution.

The Self-Learning Pipeline

Self-learning pipeline: Task → Execute → Critic → Learner → Knowledge Cards → Retrieval → back to execution Every completed task flows through a structured learning pipeline:
  1. Task arrives — a user or trigger sends a request to the mesh
  2. Agents execute — the orchestrator plans and agents run tools against real systems
  3. Critic reviews — a dedicated Critic agent evaluates the execution for quality, correctness, and completeness
  4. Learner extracts — a dedicated Learner agent analyzes the full execution trace (every tool call, every response, every retry) and distills reusable patterns
  5. Knowledge cards are created — the learner’s output becomes versioned, confidence-scored knowledge cards stored in the knowledge base
  6. RLAIF feedback — future executions that use a card report success or failure, adjusting the card’s confidence score (Laplace smoothing). Low-confidence or stale cards are automatically swept
  7. Retrieval injects context — when the next task arrives, the system performs semantic search against the knowledge base and injects relevant learnings into the agent’s prompt
This loop runs automatically. You don’t need to configure it — every task that completes feeds the loop.

How Learning Happens

Learning occurs at two points during every task:

At Every Turn

During execution, the system observes the full context of each tool call — the inputs, outputs, errors, retries, and the decisions the orchestrator made. When a tool fails and the agent recovers (retries with different parameters, falls back to an alternative approach), that recovery path is captured as a pattern. When a tool succeeds on the first attempt, the working parameters and sequence are noted. This turn-level observation means the system doesn’t just learn from final outcomes — it learns from the intermediate steps, including the ones that didn’t work.

At Session End

Once a task completes, the Learner agent receives the full execution trace and distills it into reusable knowledge. It identifies what worked, what failed, and what could be done differently. The output is structured into two scopes:
  • Scope-specific — tied to a particular target (a domain, service, API, or integration endpoint). These learnings are keyed by scope so they’re retrieved only when future tasks interact with the same target.
  • General — workflow-level patterns that transfer across contexts. These capture tool selection strategies, sequencing heuristics, error handling approaches, and other insights that apply regardless of the specific target.
The learner doesn’t just append — it merges. If a card already exists for the same scope, the learner compares the new observations against existing knowledge, updates what changed, and leaves the rest intact.

Knowledge Cards

Knowledge is stored as cards — the atomic unit of learning in Svantic. Each card is:
  • Scope-keyed — tied to a domain, workflow, or general pattern (e.g. site:ecams.geico.com, workflow:pdf-extraction)
  • Versioned — every update increments the version, so you can track how knowledge evolves
  • Confidence-scored — starts at a neutral score and adjusts with outcomes using Laplace smoothing: confidence = (successes + 1) / (successes + failures + 2)
  • Merge-aware — when the learner runs, it fetches existing cards for the same scope. If nothing changed, it emits UNCHANGED and skips the write. If the card needs updating, it merges the new observations with existing content

Card Lifecycle

New task completes → Learner runs → Existing card fetched

Card exists?
    No  → Create new card (version 1, confidence 0.5)
    Yes → Compare with trace

        Content changed?
            No  → UNCHANGED (skip write)
            Yes → UPDATE (merge + increment version)

Future tasks use the card

Success → confidence increases
Failure → confidence decreases

Confidence too low → card flagged for review
No updates for too long → stale sweep removes it

Outcome Feedback (RLAIF)

Every time an agent retrieves and uses a knowledge card during execution, the outcome is recorded:
  • Success — the task completed correctly, the card’s advice was useful → confidence increases
  • Failure — the task failed or the card’s advice was wrong → confidence decreases
This creates a reinforcement loop: cards that consistently help get higher confidence and appear more prominently in future retrievals. Cards that mislead get downranked or removed.

Stale Sweep

Cards that haven’t been updated or validated within a configurable TTL are automatically cleaned up. This prevents the knowledge base from accumulating outdated information — if a website redesigns its portal, the old navigation learnings naturally expire.

How Retrieval Works

When a task arrives, Svantic queries the knowledge base before the agent starts executing. The retrieval system uses several strategies: The query is converted to a vector embedding and compared against all card embeddings using cosine similarity. The top-k most relevant results are returned with relevance scores.

Site Boost

For tasks targeting a known domain, retrieval allocates dedicated slots for site-specific cards. If a task targets portal.example.com, the system ensures site-specific learnings for that domain appear even if generic workflow learnings have higher raw similarity scores.

Outcome Split

Retrieved results are separated into success and failure learnings. The agent’s prompt receives both:
  • “What worked” — patterns from successful past executions
  • “What failed” — patterns from failed attempts, so the agent avoids known pitfalls
This dual injection is more effective than only providing positive examples.

The Compounding Effect

Compounding effect: execution quality improves as knowledge accumulates over time Knowledge compounds across tasks and contexts. Scope-specific learnings from one integration help when the system encounters similar patterns elsewhere. Error recovery strategies generalize. Sequencing heuristics transfer. Each task adds to the store, and every future task benefits from everything that came before. Early tasks explore. Later tasks execute with the accumulated intelligence of every prior execution — institutional knowledge that builds automatically, without manual curation.

Deployment and Sharing

How knowledge flows depends on your deployment topology:
TopologyKnowledge ScopeSharing
StandaloneLocal to the instanceSingle brain, all knowledge in one place
SidecarLocal to each podEach pod builds its own knowledge independently
Central + SidecarShared across fleetSidecars contribute learnings upward; central distributes to all pods
In the Central + Sidecar topology, what Pod A learned on Monday is available to Pod C on Tuesday. This creates fleet-wide institutional intelligence — the collective experience of every agent in every pod, accessible to all.