> ## Documentation Index
> Fetch the complete documentation index at: https://docs.svantic.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Telemetry

# Telemetry & tracing

Every agent built with the SDK produces OpenTelemetry spans automatically. When the agent runs on the Svantic mesh, that data lands in the dashboard's **Traces** and **Usage** views. This guide covers what you get for free, what you can add, and how to interpret the output.

## What you get automatically

For every capability invocation the SDK opens an `execute_tool <capability_name>` span with these attributes:

* `gen_ai.operation.name = "execute_tool"`
* `gen_ai.tool.name` — the capability's name
* `gen_ai.conversation.id` — the session id
* `svantic.tenant.id`

If the agent is in smart-agent mode (LLM-driven reasoning with `instructions` + `llm` config), you also get:

* `invoke_agent <name>` — one span per user turn, carrying aggregated token counts (`gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`).
* `call_llm <model>` — one span per LLM call inside that turn, with `gen_ai.request.model`, `gen_ai.usage.*`, and `gen_ai.response.finish_reasons`.
* Nested `execute_tool` spans for each tool the LLM invokes.

W3C trace context is propagated end-to-end via `traceparent` / `baggage` headers, so when the mesh dispatches a task to your agent, your spans join the same trace as everything upstream.

No code changes are required. The mesh runtime installs the OpenTelemetry provider at startup; SDK spans flow through it without any configuration on your side.

## Turning it off

There is no per-agent switch — telemetry either has a global `TracerProvider` or it doesn't. If the host process has no provider registered, the SDK's spans silently become no-ops with zero runtime cost.

When developing agents locally without a mesh connection, there's usually nothing to turn off: spans just don't go anywhere.

## Adding custom spans — the easy way

The SDK ships three helpers that handle span lifecycle, status, error recording, and the OTel GenAI attribute names for you. Use them instead of hand-rolling `startActiveSpan` — the code is shorter and the dashboard gets richer data.

```typescript theme={null}
import { trace_llm, trace_tool, trace_step } from '@svantic/sdk';
```

### `trace_llm(meta, fn)` — any LLM provider

Works for OpenAI, Anthropic, Bedrock, Vertex, Ollama, or anything else. You're responsible for calling the provider; the helper takes care of the span.

```typescript theme={null}
import OpenAI from 'openai';
import { trace_llm } from '@svantic/sdk';

const openai = new OpenAI();

const answer = await trace_llm(
  {
    system: 'openai',
    model: 'gpt-4o-mini',
    temperature: 0.2,
  },
  async () => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }],
    });
    return {
      value: res.choices[0].message.content ?? '',
      telemetry: {
        input_tokens: res.usage?.prompt_tokens,
        output_tokens: res.usage?.completion_tokens,
        finish_reasons: [res.choices[0].finish_reason ?? 'stop'],
      },
    };
  },
);
```

The span gets `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.*`, and `gen_ai.response.finish_reasons` populated automatically. The dashboard waterfall renders it as an `llm.chat <model>` row with the model name and `1.2k→340` token summary on the bar. Errors are captured as span events with status=ERROR.

The same shape works for Anthropic:

```typescript theme={null}
await trace_llm(
  { system: 'anthropic', model: 'claude-3-5-sonnet' },
  async () => {
    const msg = await anthropic.messages.create({ /* … */ });
    return {
      value: msg.content[0].text,
      telemetry: {
        input_tokens: msg.usage.input_tokens,
        output_tokens: msg.usage.output_tokens,
        finish_reasons: [msg.stop_reason ?? 'end_turn'],
      },
    };
  },
);
```

…and Bedrock:

```typescript theme={null}
await trace_llm(
  { system: 'aws.bedrock', model: 'anthropic.claude-3-5-sonnet-20240620-v1:0' },
  async () => {
    const out = await bedrock.invokeModel({ /* … */ });
    return { value: out.body, telemetry: { input_tokens, output_tokens } };
  },
);
```

### `trace_tool(meta, fn)` — any tool call

Use for database queries, HTTP calls, shell-outs, MCP servers — anything that goes outside your process on behalf of an LLM.

```typescript theme={null}
import { trace_tool } from '@svantic/sdk';

const rows = await trace_tool(
  { name: 'postgres.query', kind: 'db' },
  () => db.query('select * from orders where user_id = $1', [user_id]),
);
```

The span appears as `tool.execute postgres.query` in the waterfall with a distinct color. If the tool throws, the span is marked red.

### `trace_step(name, fn)` — everything else

Use for planning, parsing, validation, or any block of work that would otherwise show up as **unaccounted time** in the dashboard.

```typescript theme={null}
import { trace_step } from '@svantic/sdk';

const plan = await trace_step('build_plan', () => compose_plan(goal));
const parsed = await trace_step('parse_response', () => validate(raw));
```

The dashboard's *Unaccounted time* strip above each waterfall tells you, in milliseconds, how much of the trace's wall-clock time isn't covered by any span. Wrap the suspect code in `trace_step` until that number is near zero and you'll have a fully instrumented flow.

### Errors and cancellation

All three helpers record the thrown exception as a span event, set span status to ERROR, and rethrow unchanged. You never lose the original error or its stack.

### No provider? No problem.

If the host process has no OpenTelemetry `TracerProvider` installed (e.g. during local unit tests), every helper becomes a no-op with zero runtime cost. Leave them in; nothing needs to be conditional.

## Events

OpenTelemetry events are structured signals attached to a span:

```typescript theme={null}
const span = trace.getActiveSpan();
span?.addEvent('refund_policy_applied', {
  policy: 'under_500_auto_approve',
  amount_cents: amount,
});
```

They show up as markers on the span timeline in the dashboard.

## Forwarding trace context to downstream HTTP services

The session context carries `propagation_headers` — forward them verbatim and W3C-compliant services will join the same trace:

```typescript theme={null}
handler: async (args, ctx) => {
  const res = await fetch('https://legacy.internal/orders', {
    headers: ctx.propagation_headers ?? {},
  });
  return res.json();
}
```

See [Trace propagation](../reference/trace-propagation) for the low-level helpers.

## Using your own OpenTelemetry backend

To export traces to Datadog, Honeycomb, Grafana Tempo, or any OTLP collector, register a `TracerProvider` at process startup **before** constructing any `Agent`:

```typescript theme={null}
import { NodeTracerProvider, BatchSpanProcessor } from '@opentelemetry/sdk-trace-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
  new OTLPTraceExporter({ url: 'https://my-collector:4318/v1/traces' }),
));
provider.register();
```

All SDK spans flow into your pipeline automatically. If you're also connected to a Svantic mesh, the mesh-side provider ships duplicate spans to the dashboard — set one or the other depending on where you want the data.

## Reading traces in the dashboard

* **Traces tab** — one row per session. Click to see the waterfall.
* **Waterfall** — `invoke_agent` at the root, `call_llm` and `execute_tool` as children, your custom spans nested underneath.
* **Events** — rendered as markers on the span timeline.
* **Usage** — token counts aggregated from `gen_ai.usage.*`, rolled up per trace and per model.

## See also

* [Telemetry reference](../reference/telemetry)
* [Trace propagation reference](../reference/trace-propagation)
* [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
