> ## Documentation Index
> Fetch the complete documentation index at: https://docs.svantic.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Telemetry

# Telemetry

```typescript theme={null}
import { trace_llm, trace_tool, trace_step, record_span_error } from '@svantic/sdk';
```

## What it is

The SDK emits **OpenTelemetry** spans for every capability invocation, LLM call, and tool call it runs — no setup required on your side. Spans follow the [OpenTelemetry GenAI semantic conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) (`gen_ai.operation.name`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, …), so they land cleanly in any OTEL-aware backend.

Three helpers are exported from the SDK for instrumenting your own code:

| Helper                 | Use for                                                            | Emits span            |
| ---------------------- | ------------------------------------------------------------------ | --------------------- |
| `trace_llm(meta, fn)`  | Direct LLM calls (OpenAI, Anthropic, Bedrock, Vertex, Ollama, any) | `llm.<op> <model>`    |
| `trace_tool(meta, fn)` | External calls (DB, HTTP, shell, MCP)                              | `tool.execute <name>` |
| `trace_step(name, fn)` | Arbitrary work blocks (planning, parsing, validation)              | `step.<name>`         |

Where the spans go depends on where the agent runs:

* **On the Svantic mesh** (hosted, or self-hosted): the mesh runtime installs a global `TracerProvider` at startup and ships all completed spans to the gateway. They show up in the dashboard's Traces and Usage views.
* **Anywhere else**: if the process has no global `TracerProvider`, the helpers become no-ops — zero runtime cost, nothing to configure.

## When to use it

In the common case, **you don't**. The SDK already traces:

* Every capability invocation (`execute_tool <capability_name>` spans with `gen_ai.tool.*` attributes)
* Every LLM call made by smart-agent mode (`call_llm <model>` spans with `gen_ai.request.model`, `gen_ai.usage.*`, `gen_ai.response.finish_reasons`)
* Every agent invocation in smart-agent mode (`invoke_agent <name>` spans with `gen_ai.conversation.id`, aggregated token totals)

You only need to add spans yourself when you want finer-grained visibility inside a capability — e.g. around a database query, a third-party API call, or a business workflow step.

## API

### `trace_llm(meta, fn)`

Wrap any LLM provider call so it shows up as a dedicated child span with the standard `gen_ai.*` attributes.

```typescript theme={null}
function trace_llm<T>(
  meta: {
    system: 'openai' | 'anthropic' | 'gcp.gemini' | 'aws.bedrock' | 'azure.openai' | 'ollama' | 'other' | string,
    model: string,
    operation?: 'chat' | 'text_completion' | 'embeddings' | 'other',
    temperature?: number,
    max_tokens?: number,
    attributes?: Record<string, string | number | boolean>,
  },
  fn: (span: Span) => Promise<{
    value: T,
    telemetry?: {
      input_tokens?: number,
      output_tokens?: number,
      finish_reasons?: string[],
      attributes?: Record<string, string | number | boolean>,
    },
  }>,
): Promise<T>;
```

The callback returns `{ value, telemetry? }`. The helper attaches `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, and `gen_ai.response.finish_reasons` from the `telemetry` object, then resolves the outer promise with `value` alone — so the caller sees a clean value.

**Example (OpenAI):**

```typescript theme={null}
const content = await trace_llm(
  { system: 'openai', model: 'gpt-4o-mini', temperature: 0.2 },
  async () => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }],
    });
    return {
      value: res.choices[0].message.content ?? '',
      telemetry: {
        input_tokens: res.usage?.prompt_tokens,
        output_tokens: res.usage?.completion_tokens,
        finish_reasons: [res.choices[0].finish_reason ?? 'stop'],
      },
    };
  },
);
```

**Example (Anthropic):**

```typescript theme={null}
const text = await trace_llm(
  { system: 'anthropic', model: 'claude-3-5-sonnet' },
  async () => {
    const msg = await anthropic.messages.create({ /* … */ });
    return {
      value: msg.content[0].text,
      telemetry: {
        input_tokens: msg.usage.input_tokens,
        output_tokens: msg.usage.output_tokens,
        finish_reasons: [msg.stop_reason ?? 'end_turn'],
      },
    };
  },
);
```

Errors are recorded as span events with status=ERROR and rethrown unchanged.

### `trace_tool(meta, fn)`

Wrap any tool/side-effect call.

```typescript theme={null}
function trace_tool<T>(
  meta: {
    name: string,        // canonical tool name
    call_id?: string,    // optional tool-call id (correlates with an LLM tool_call)
    kind?: string,       // optional category (e.g. 'http', 'db', 'mcp')
    args?: unknown,      // optional args snapshot (JSON-serialised, truncated to 4 KB)
    attributes?: Record<string, string | number | boolean>,
  },
  fn: (span: Span) => Promise<T>,
): Promise<T>;
```

**Example:**

```typescript theme={null}
const rows = await trace_tool(
  { name: 'postgres.query', kind: 'db' },
  () => db.query('select * from orders where user_id = $1', [user_id]),
);
```

### `trace_step(name, fn)`

Wrap arbitrary work so it shows up as `step.<name>` in the waterfall. Use to eliminate "unaccounted time" gaps.

```typescript theme={null}
function trace_step<T>(
  name: string,
  fn: (span: Span) => Promise<T> | T,
  meta?: { attributes?: Record<string, string | number | boolean> },
): Promise<T>;
```

**Example:**

```typescript theme={null}
const plan = await trace_step('build_plan', () => compose_plan(goal));
const parsed = await trace_step('parse_response', () => validate(raw));
```

### `record_span_error(span, err)`

For advanced callers who start their own spans via `@opentelemetry/api`: mark the span as failed in a way consistent with the helpers above (records the exception, sets status=ERROR, attaches `error.message` and `error.type`).

```typescript theme={null}
const tracer = trace.getTracer('my-code');
tracer.startActiveSpan('manual', async (span) => {
  try { await work(); }
  catch (err) { record_span_error(span, err); throw err; }
  finally { span.end(); }
});
```

## Spans the SDK & mesh emit

| Span name             | Emitted by                      | Operation                        | Key attributes                                                                                                                                                                                                                                         |
| --------------------- | ------------------------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `execute_tool <name>` | SDK capability executor         | capability invocation            | `gen_ai.operation.name=execute_tool`, `gen_ai.tool.name`, `gen_ai.conversation.id`, `svantic.tenant.id`                                                                                                                                                |
| `call_llm <model>`    | SDK smart-agent loop            | LLM call inside smart-agent mode | `gen_ai.operation.name=chat`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.request.temperature`, `gen_ai.request.max_tokens`, `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `gen_ai.response.finish_reasons`, `svantic.llm.iteration` |
| `invoke_agent <name>` | ADK (mesh side)                 | mesh agent turn                  | `gen_ai.operation.name=invoke_agent`, `gen_ai.agent.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.conversation.id`                                                                                                                           |
| `llm.chat <model>`    | Mesh (ADK auto-instrumentation) | ADK LLM call                     | `gen_ai.operation.name=chat`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.*`, `gen_ai.response.finish_reasons`, `svantic.source=adk.LlmAgent.callLlmAsync`                                                                                  |
| `llm.<op> <model>`    | `trace_llm`                     | custom LLM call                  | `gen_ai.operation.name`, `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.*`, `gen_ai.response.finish_reasons`                                                                                                                                   |
| `tool.execute <name>` | `trace_tool`                    | custom tool call                 | `gen_ai.operation.name=execute_tool`, `gen_ai.tool.name`, `svantic.tool.kind`, `svantic.tool.args`                                                                                                                                                     |
| `step.<name>`         | `trace_step`                    | custom work block                | any attributes you pass                                                                                                                                                                                                                                |

## Using your own OpenTelemetry backend

To send traces to Datadog, Honeycomb, Grafana Tempo, or any OTLP collector, configure a `TracerProvider` yourself at process startup — before creating any `Agent`:

```typescript theme={null}
import { NodeTracerProvider, BatchSpanProcessor } from '@opentelemetry/sdk-trace-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';

const provider = new NodeTracerProvider();
provider.addSpanProcessor(new BatchSpanProcessor(
  new OTLPTraceExporter({ url: 'https://my-otel-collector:4318/v1/traces' }),
));
provider.register();
```

All SDK spans will flow into your pipeline automatically. If the agent is also connected to a Svantic mesh, the mesh's own provider wins in that process (the mesh calls `maybeSetOtelProviders`, which is first-write-wins), but the agent-side provider is preserved if it's the first one registered.

## See also

* [Telemetry guide](../guides/telemetry) — reading traces in the dashboard.
* [Trace propagation](./trace-propagation) — W3C `traceparent` / `baggage` headers across service boundaries.
