Telemetry

import { trace_llm, trace_tool, trace_step, record_span_error } from '@svantic/sdk';

What it is

The SDK emits OpenTelemetry spans for every capability invocation, LLM call, and tool call it runs — no setup required on your side. Spans follow the OpenTelemetry GenAI semantic conventions (gen_ai.operation.name, gen_ai.request.model, gen_ai.usage.input_tokens, …), so they land cleanly in any OTEL-aware backend. Three helpers are exported from the SDK for instrumenting your own code:

Helper	Use for	Emits span
`trace_llm(meta, fn)`	Direct LLM calls (OpenAI, Anthropic, Bedrock, Vertex, Ollama, any)	`llm.<op> <model>`
`trace_tool(meta, fn)`	External calls (DB, HTTP, shell, MCP)	`tool.execute <name>`
`trace_step(name, fn)`	Arbitrary work blocks (planning, parsing, validation)	`step.<name>`

Where the spans go depends on where the agent runs:

On the Svantic mesh (hosted, or self-hosted): the mesh runtime installs a global TracerProvider at startup and ships all completed spans to the gateway. They show up in the dashboard’s Traces and Usage views.
Anywhere else: if the process has no global TracerProvider, the helpers become no-ops — zero runtime cost, nothing to configure.

When to use it

In the common case, you don’t. The SDK already traces:

Every capability invocation (execute_tool <capability_name> spans with gen_ai.tool.* attributes)
Every LLM call made by smart-agent mode (call_llm <model> spans with gen_ai.request.model, gen_ai.usage.*, gen_ai.response.finish_reasons)
Every agent invocation in smart-agent mode (invoke_agent <name> spans with gen_ai.conversation.id, aggregated token totals)

You only need to add spans yourself when you want finer-grained visibility inside a capability — e.g. around a database query, a third-party API call, or a business workflow step.

API

`trace_llm(meta, fn)`

Wrap any LLM provider call so it shows up as a dedicated child span with the standard gen_ai.* attributes.

function trace_llm<T>(
  meta: {
    system: 'openai' | 'anthropic' | 'gcp.gemini' | 'aws.bedrock' | 'azure.openai' | 'ollama' | 'other' | string,
    model: string,
    operation?: 'chat' | 'text_completion' | 'embeddings' | 'other',
    temperature?: number,
    max_tokens?: number,
    attributes?: Record<string, string | number | boolean>,
  },
  fn: (span: Span) => Promise<{
    value: T,
    telemetry?: {
      input_tokens?: number,
      output_tokens?: number,
      finish_reasons?: string[],
      attributes?: Record<string, string | number | boolean>,
    },
  }>,
): Promise<T>;

The callback returns { value, telemetry? }. The helper attaches gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.response.finish_reasons from the telemetry object, then resolves the outer promise with value alone — so the caller sees a clean value. Example (OpenAI):

const content = await trace_llm(
  { system: 'openai', model: 'gpt-4o-mini', temperature: 0.2 },
  async () => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }],
    });
    return {
      value: res.choices[0].message.content ?? '',
      telemetry: {
        input_tokens: res.usage?.prompt_tokens,
        output_tokens: res.usage?.completion_tokens,
        finish_reasons: [res.choices[0].finish_reason ?? 'stop'],
      },
    };
  },
);

Example (Anthropic):

const text = await trace_llm(
  { system: 'anthropic', model: 'claude-3-5-sonnet' },
  async () => {
    const msg = await anthropic.messages.create({ /* … */ });
    return {
      value: msg.content[0].text,
      telemetry: {
        input_tokens: msg.usage.input_tokens,
        output_tokens: msg.usage.output_tokens,
        finish_reasons: [msg.stop_reason ?? 'end_turn'],
      },
    };
  },
);

Errors are recorded as span events with status=ERROR and rethrown unchanged.

`trace_tool(meta, fn)`

Wrap any tool/side-effect call.

function trace_tool<T>(
  meta: {
    name: string,        // canonical tool name
    call_id?: string,    // optional tool-call id (correlates with an LLM tool_call)
    kind?: string,       // optional category (e.g. 'http', 'db', 'mcp')
    args?: unknown,      // optional args snapshot (JSON-serialised, truncated to 4 KB)
    attributes?: Record<string, string | number | boolean>,
  },
  fn: (span: Span) => Promise<T>,
): Promise<T>;

Example:

const rows = await trace_tool(
  { name: 'postgres.query', kind: 'db' },
  () => db.query('select * from orders where user_id = $1', [user_id]),
);

`trace_step(name, fn)`

Wrap arbitrary work so it shows up as step.<name> in the waterfall. Use to eliminate “unaccounted time” gaps.

function trace_step<T>(
  name: string,
  fn: (span: Span) => Promise<T> | T,
  meta?: { attributes?: Record<string, string | number | boolean> },
): Promise<T>;

Example:

const plan = await trace_step('build_plan', () => compose_plan(goal));
const parsed = await trace_step('parse_response', () => validate(raw));

`record_span_error(span, err)`

For advanced callers who start their own spans via @opentelemetry/api: mark the span as failed in a way consistent with the helpers above (records the exception, sets status=ERROR, attaches error.message and error.type).

const tracer = trace.getTracer('my-code');
tracer.startActiveSpan('manual', async (span) => {
  try { await work(); }
  catch (err) { record_span_error(span, err); throw err; }
  finally { span.end(); }
});

Spans the SDK & mesh emit

Span name	Emitted by	Operation	Key attributes
`execute_tool <name>`	SDK capability executor	capability invocation	`gen_ai.operation.name=execute_tool`, `gen_ai.tool.name`, `gen_ai.conversation.id`, `svantic.tenant.id`
`call_llm <model>`	SDK smart-agent loop	LLM call inside smart-agent mode	`gen_ai.operation.name=chat`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.request.temperature`, `gen_ai.request.max_tokens`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons`, `svantic.llm.iteration`
`invoke_agent <name>`	ADK (mesh side)	mesh agent turn	`gen_ai.operation.name=invoke_agent`, `gen_ai.agent.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.conversation.id`
`llm.chat <model>`	Mesh (ADK auto-instrumentation)	ADK LLM call	`gen_ai.operation.name=chat`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.*`, `gen_ai.response.finish_reasons`, `svantic.source=adk.LlmAgent.callLlmAsync`
`llm.<op> <model>`	`trace_llm`	custom LLM call	`gen_ai.operation.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.*`, `gen_ai.response.finish_reasons`
`tool.execute <name>`	`trace_tool`	custom tool call	`gen_ai.operation.name=execute_tool`, `gen_ai.tool.name`, `svantic.tool.kind`, `svantic.tool.args`
`step.<name>`	`trace_step`	custom work block	any attributes you pass

Using your own OpenTelemetry backend

To send traces to Datadog, Honeycomb, Grafana Tempo, or any OTLP collector, configure a TracerProvider yourself at process startup — before creating any Agent:

import { NodeTracerProvider, BatchSpanProcessor } from '@opentelemetry/sdk-trace-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
  new OTLPTraceExporter({ url: 'https://my-otel-collector:4318/v1/traces' }),
));
provider.register();

All SDK spans will flow into your pipeline automatically. If the agent is also connected to a Svantic mesh, the mesh’s own provider wins in that process (the mesh calls maybeSetOtelProviders, which is first-write-wins), but the agent-side provider is preserved if it’s the first one registered.

Overview

Guides

Reference

Telemetry

Telemetry

What it is

When to use it

API

`trace_llm(meta, fn)`

`trace_tool(meta, fn)`

`trace_step(name, fn)`

`record_span_error(span, err)`

Spans the SDK & mesh emit

Using your own OpenTelemetry backend

See also

Overview

Guides

Reference

Documentation Index

​Telemetry

​What it is

​When to use it

​API

​trace_llm(meta, fn)

​trace_tool(meta, fn)

​trace_step(name, fn)

​record_span_error(span, err)

​Spans the SDK & mesh emit

​Using your own OpenTelemetry backend

​See also

Telemetry

What it is

When to use it

API

`trace_llm(meta, fn)`

`trace_tool(meta, fn)`

`trace_step(name, fn)`

`record_span_error(span, err)`

Spans the SDK & mesh emit

Using your own OpenTelemetry backend

See also