Telemetry & tracing

Every agent built with the SDK produces OpenTelemetry spans automatically. When the agent runs on the Svantic mesh, that data lands in the dashboard’s Traces and Usage views. This guide covers what you get for free, what you can add, and how to interpret the output.

What you get automatically

For every capability invocation the SDK opens an execute_tool <capability_name> span with these attributes:

gen_ai.operation.name = "execute_tool"
gen_ai.tool.name — the capability’s name
gen_ai.conversation.id — the session id
svantic.tenant.id

If the agent is in smart-agent mode (LLM-driven reasoning with instructions + llm config), you also get:

invoke_agent <name> — one span per user turn, carrying aggregated token counts (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens).
call_llm <model> — one span per LLM call inside that turn, with gen_ai.request.model, gen_ai.usage.*, and gen_ai.response.finish_reasons.
Nested execute_tool spans for each tool the LLM invokes.

W3C trace context is propagated end-to-end via traceparent / baggage headers, so when the mesh dispatches a task to your agent, your spans join the same trace as everything upstream. No code changes are required. The mesh runtime installs the OpenTelemetry provider at startup; SDK spans flow through it without any configuration on your side.

Turning it off

There is no per-agent switch — telemetry either has a global TracerProvider or it doesn’t. If the host process has no provider registered, the SDK’s spans silently become no-ops with zero runtime cost. When developing agents locally without a mesh connection, there’s usually nothing to turn off: spans just don’t go anywhere.

Adding custom spans — the easy way

The SDK ships three helpers that handle span lifecycle, status, error recording, and the OTel GenAI attribute names for you. Use them instead of hand-rolling startActiveSpan — the code is shorter and the dashboard gets richer data.

import { trace_llm, trace_tool, trace_step } from '@svantic/sdk';

`trace_llm(meta, fn)` — any LLM provider

Works for OpenAI, Anthropic, Bedrock, Vertex, Ollama, or anything else. You’re responsible for calling the provider; the helper takes care of the span.

import OpenAI from 'openai';
import { trace_llm } from '@svantic/sdk';

const openai = new OpenAI();

const answer = await trace_llm(
  {
    system: 'openai',
    model: 'gpt-4o-mini',
    temperature: 0.2,
  },
  async () => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }],
    });
    return {
      value: res.choices[0].message.content ?? '',
      telemetry: {
        input_tokens: res.usage?.prompt_tokens,
        output_tokens: res.usage?.completion_tokens,
        finish_reasons: [res.choices[0].finish_reason ?? 'stop'],
      },
    };
  },
);

The span gets gen_ai.system, gen_ai.request.model, gen_ai.usage.*, and gen_ai.response.finish_reasons populated automatically. The dashboard waterfall renders it as an llm.chat <model> row with the model name and 1.2k→340 token summary on the bar. Errors are captured as span events with status=ERROR. The same shape works for Anthropic:

await trace_llm(
  { system: 'anthropic', model: 'claude-3-5-sonnet' },
  async () => {
    const msg = await anthropic.messages.create({ /* … */ });
    return {
      value: msg.content[0].text,
      telemetry: {
        input_tokens: msg.usage.input_tokens,
        output_tokens: msg.usage.output_tokens,
        finish_reasons: [msg.stop_reason ?? 'end_turn'],
      },
    };
  },
);

…and Bedrock:

await trace_llm(
  { system: 'aws.bedrock', model: 'anthropic.claude-3-5-sonnet-20240620-v1:0' },
  async () => {
    const out = await bedrock.invokeModel({ /* … */ });
    return { value: out.body, telemetry: { input_tokens, output_tokens } };
  },
);

`trace_tool(meta, fn)` — any tool call

Use for database queries, HTTP calls, shell-outs, MCP servers — anything that goes outside your process on behalf of an LLM.

import { trace_tool } from '@svantic/sdk';

const rows = await trace_tool(
  { name: 'postgres.query', kind: 'db' },
  () => db.query('select * from orders where user_id = $1', [user_id]),
);

The span appears as tool.execute postgres.query in the waterfall with a distinct color. If the tool throws, the span is marked red.

`trace_step(name, fn)` — everything else

Use for planning, parsing, validation, or any block of work that would otherwise show up as unaccounted time in the dashboard.

import { trace_step } from '@svantic/sdk';

const plan = await trace_step('build_plan', () => compose_plan(goal));
const parsed = await trace_step('parse_response', () => validate(raw));

The dashboard’s Unaccounted time strip above each waterfall tells you, in milliseconds, how much of the trace’s wall-clock time isn’t covered by any span. Wrap the suspect code in trace_step until that number is near zero and you’ll have a fully instrumented flow.

Errors and cancellation

All three helpers record the thrown exception as a span event, set span status to ERROR, and rethrow unchanged. You never lose the original error or its stack.

No provider? No problem.

If the host process has no OpenTelemetry TracerProvider installed (e.g. during local unit tests), every helper becomes a no-op with zero runtime cost. Leave them in; nothing needs to be conditional.

Events

OpenTelemetry events are structured signals attached to a span:

const span = trace.getActiveSpan();
span?.addEvent('refund_policy_applied', {
  policy: 'under_500_auto_approve',
  amount_cents: amount,
});

They show up as markers on the span timeline in the dashboard.

Forwarding trace context to downstream HTTP services

The session context carries propagation_headers — forward them verbatim and W3C-compliant services will join the same trace:

handler: async (args, ctx) => {
  const res = await fetch('https://legacy.internal/orders', {
    headers: ctx.propagation_headers ?? {},
  });
  return res.json();
}

See Trace propagation for the low-level helpers.

Using your own OpenTelemetry backend

To export traces to Datadog, Honeycomb, Grafana Tempo, or any OTLP collector, register a TracerProvider at process startup before constructing any Agent:

import { NodeTracerProvider, BatchSpanProcessor } from '@opentelemetry/sdk-trace-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
  new OTLPTraceExporter({ url: 'https://my-collector:4318/v1/traces' }),
));
provider.register();

All SDK spans flow into your pipeline automatically. If you’re also connected to a Svantic mesh, the mesh-side provider ships duplicate spans to the dashboard — set one or the other depending on where you want the data.

Reading traces in the dashboard

Traces tab — one row per session. Click to see the waterfall.
Waterfall — invoke_agent at the root, call_llm and execute_tool as children, your custom spans nested underneath.
Events — rendered as markers on the span timeline.
Usage — token counts aggregated from gen_ai.usage.*, rolled up per trace and per model.

Overview

Guides

Reference

Telemetry

Telemetry & tracing

What you get automatically

Turning it off

Adding custom spans — the easy way

`trace_llm(meta, fn)` — any LLM provider

`trace_tool(meta, fn)` — any tool call

`trace_step(name, fn)` — everything else

Errors and cancellation

No provider? No problem.

Events

Forwarding trace context to downstream HTTP services

Using your own OpenTelemetry backend

Reading traces in the dashboard

See also

Overview

Guides

Reference

Documentation Index

​Telemetry & tracing

​What you get automatically

​Turning it off

​Adding custom spans — the easy way

​trace_llm(meta, fn) — any LLM provider

​trace_tool(meta, fn) — any tool call

​trace_step(name, fn) — everything else

​Errors and cancellation

​No provider? No problem.

​Events

​Forwarding trace context to downstream HTTP services

​Using your own OpenTelemetry backend

​Reading traces in the dashboard

​See also

Telemetry & tracing

What you get automatically

Turning it off

Adding custom spans — the easy way

`trace_llm(meta, fn)` — any LLM provider

`trace_tool(meta, fn)` — any tool call

`trace_step(name, fn)` — everything else

Errors and cancellation

No provider? No problem.

Events

Forwarding trace context to downstream HTTP services

Using your own OpenTelemetry backend

Reading traces in the dashboard

See also