> ## Documentation Index
> Fetch the complete documentation index at: https://docs.svantic.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Agents connect

# WebSocket API: `GET /agents/connect`

Persistent, agent-initiated WebSocket used by connected-mode agents to
receive dispatches from the Svantic mesh. This is the only transport
for connected-mode agents; see
[Agent Connectivity](../../concepts/agent-connectivity) for the
conceptual background.

OpenAPI does not support WebSocket endpoints, so this reference lives
as a Markdown page. The wire format is fully versioned under the
subprotocol identifier `svantic.v1` and is governed by the internal
spec ([`platform/docs/specs/ws_transport.md`](https://github.com/measureone/svantic/blob/main/platform/docs/specs/ws_transport.md),
engineer-facing).

## Endpoint

```
wss://api.svantic.com/agents/connect?instance_id=<instance_id>
```

* **Scheme**: `wss://` only. `ws://` is rejected.
* **Query parameters**:
  * `instance_id` (required) — the instance that registered with
    `deployment_mode: connected`. Registration happens first over
    HTTPS; only after `POST /agents/register` returns a `connect_url`
    does the WebSocket upgrade succeed.
* **Subprotocol**: client must offer `svantic.v1` in
  `Sec-WebSocket-Protocol`. The server echoes it back in the `101`
  response. Clients that offer no compatible subprotocol are rejected
  with `400 UNSUPPORTED_SUBPROTOCOL`.

## Authentication

Upgrade requests carry a tenant-scoped JWT in the standard bearer
header:

```
Authorization: Bearer <jwt>
```

The same JWT used for `POST /agents/register`. Cookie authentication
is not supported. The JWT's `tenant_id` must match the tenant that
owns the `instance_id`.

## Handshake errors

The server validates four things, in order, before allocating a
socket. Each failure returns a plain HTTP response — **no** `101`
upgrade — with a JSON body under `application/json`.

| HTTP status | Code                       | When                                                                                               |
| ----------- | -------------------------- | -------------------------------------------------------------------------------------------------- |
| `400`       | `MISSING_INSTANCE_ID`      | Query parameter `instance_id` is absent or empty.                                                  |
| `400`       | `UNSUPPORTED_SUBPROTOCOL`  | Client did not offer `svantic.v1`.                                                                 |
| `401`       | `UNAUTHORIZED`             | JWT is missing, malformed, signature-invalid, or expired.                                          |
| `403`       | `TENANT_MISMATCH`          | JWT's tenant does not own the requested `instance_id`.                                             |
| `404`       | `INSTANCE_NOT_FOUND`       | No registered instance with that `instance_id`.                                                    |
| `409`       | `DEPLOYMENT_MODE_MISMATCH` | Instance exists but was registered as `hosted`. Re-register as `connected` on a new `instance_id`. |

Example failure body:

```json theme={null}
{
  "error": "DEPLOYMENT_MODE_MISMATCH",
  "message": "Instance navigator-prod-01 is registered as 'hosted'; WebSocket upgrade requires 'connected'.",
  "status": 409
}
```

## Lifecycle

```mermaid theme={null}
stateDiagram-v2
    [*] --> agent
    agent --> authenticated : upgrade (101)
    authenticated --> ready : hello / welcome
    ready --> streaming : dispatch / chunk
    streaming --> ready : dispatch_result
```

After `101 Switching Protocols` the client **must** send a `hello`
frame as its first text frame. The server replies with `welcome`;
the connection is "live" only after the client observes `welcome`.
The mesh will not push `dispatch` frames before the client reaches
the `ready` state.

## Frame envelope

Every frame is a UTF-8 JSON text frame. Binary frames are reserved
for future use; if the server receives one it closes the socket with
code `1003 Unsupported Data`.

```jsonc theme={null}
{
  "v": 1,
  "type": "dispatch",
  "id": "0199e3b5-7d8c-7a10-9a1c-ff65e2b3c0de",
  "ts": "2026-04-17T13:41:22.814Z",
  "in_reply_to": null,
  "trace_id": null,
  "parent_span_id": null,
  "payload": { }
}
```

* `v` — protocol version literal. Always `1` for `svantic.v1`. A
  breaking change ships under a new subprotocol identifier.
* `type` — see the frame catalog below.
* `id` — sender-assigned UUID v7. Used for correlation; response
  frames set `in_reply_to` to the request's `id`.
* `ts` — sender wall-clock in ISO 8601 / RFC 3339.
* `in_reply_to` — required on response frames (`dispatch_result`,
  `dispatch_chunk`, `dispatch_ack`, `tool_result`, `pong`); otherwise
  omitted or `null`.
* `trace_id`, `parent_span_id` — optional envelope-level W3C trace
  context for per-frame routing telemetry. Note that trace context
  for your *business logic* arrives inside the `dispatch` payload, at
  `payload.session_context.propagation_headers` (W3C `traceparent` +
  `baggage`).
* `payload` — type-specific, documented per frame below.

Frames that fail schema validation receive an `error` frame (`code:
BAD_FRAME`) and the server closes the socket with code `1002
Protocol Error`.

## Frame catalog

### `hello` — agent → mesh

First frame after upgrade. Announces the agent and optionally asks
to resume.

```json theme={null}
{
  "v": 1,
  "type": "hello",
  "id": "…",
  "ts": "…",
  "payload": {
    "instance_id": "navigator-prod-01",
    "agent_type": "navigator",
    "agent_version": "1.4.2",
    "sdk_version": "@svantic/sdk@0.12.0",
    "agent_card": { },
    "resume_token": null
  }
}
```

### `welcome` — mesh → agent

Acknowledges `hello`. Transitions the client to `ready`.

```json theme={null}
{
  "v": 1,
  "type": "welcome",
  "id": "…",
  "ts": "…",
  "in_reply_to": "<hello.id>",
  "payload": {
    "resumed": false,
    "server_time": "2026-04-17T13:41:23.001Z",
    "replayed_dispatches": []
  }
}
```

When `resumed: true`, `replayed_dispatches` lists the `dispatch.id`
values the server is re-delivering on this reconnect.

### `dispatch` — mesh → agent

A single A2A task to execute. `payload` is **byte-identical** to the
body the mesh would POST to a hosted agent for the same operation —
so the same handler code works on both transports.

```jsonc theme={null}
{
  "type": "dispatch",
  "payload": {
    "skill_id": "lookup_ticket",
    "args": { "ticket_id": 42 },
    "session_context": {
      "session_id": "sess-abc",
      "tenant_id": "tenant-1",
      "propagation_headers": {
        "traceparent": "00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01",
        "baggage": "svantic.session_id=sess-abc,svantic.tenant_id=tenant-1,svantic.invocation_id=deadbeefcafef00d"
      }
    },
    "deadline_ms": 1713361323000
  }
}
```

### `dispatch_ack` — agent → mesh *(optional)*

Optional "received and working" signal. If the server does not
receive a terminal response (result or error) before
`payload.deadline_ms`, the dispatch times out regardless of whether
an ack was sent.

### `dispatch_chunk` — agent → mesh

A streaming output chunk. Multiple chunks, in order, may precede a
`dispatch_result`. Receivers must preserve chunk ordering per
`in_reply_to` group.

```jsonc theme={null}
{
  "type": "dispatch_chunk",
  "in_reply_to": "<dispatch.id>",
  "payload": { "delta": "…" }
}
```

### `dispatch_result` — agent → mesh

Terminal success for a dispatch.

```jsonc theme={null}
{
  "type": "dispatch_result",
  "in_reply_to": "<dispatch.id>",
  "payload": { "result": { } }
}
```

### `tool_call` / `tool_result`

When one agent invokes another agent's tool, the request is routed
through the mesh. `tool_call` has the same payload shape as a hosted
tool invocation; `tool_result` is the reply.

### `heartbeat` — agent → mesh

Self-reported presence, load, and health. Carries the same
`HeartbeatPayload` as the HTTP `POST /agents/heartbeat` endpoint —
hosted agents heartbeat over HTTP, connected agents heartbeat over
this frame, but the payload bytes are identical and the gateway
stores them in the same row.

```jsonc theme={null}
{
  "type": "heartbeat",
  "payload": {
    "status": "available",
    "current_sessions": 3,
    "max_concurrent_sessions": 16,
    "consecutive_failures": 0
  }
}
```

Cadence: every 30 s (the shared `HEARTBEAT_INTERVAL_MS` constant),
or immediately on any status change.

### `ping` / `pong`

Application-level keepalive. Independent of the WebSocket protocol's
own ping/pong — both run in parallel to catch different failure
modes.

### `error`

Out-of-band error. Carries a stable machine-readable code the client
can branch on; does **not** close the socket on its own unless the
server decides it must.

```jsonc theme={null}
{
  "type": "error",
  "in_reply_to": null,
  "payload": {
    "code": "RATE_LIMITED",
    "message": "Frame rate exceeded.",
    "detail": { "retry_after_ms": 1000 }
  }
}
```

### `close_request`

Graceful shutdown request. The sender promises no new `dispatch`
frames; in-flight dispatches continue up to `payload.grace_seconds`
(default 30 s). After the grace window, the sender closes the socket
with code `1000 Normal Closure`.

## Error frame catalog

`payload.code` values are stable. New codes may be added; existing
codes never change meaning.

| Code                 | Direction       | Meaning                                                                                            |
| -------------------- | --------------- | -------------------------------------------------------------------------------------------------- |
| `BAD_FRAME`          | server → client | Frame failed schema validation. The server closes the connection with `1002` immediately after.    |
| `AUTH_EXPIRED`       | server → client | JWT expired mid-connection. Client must reconnect with a fresh token.                              |
| `AGENT_DISCONNECTED` | server → client | The dispatch targeted an instance whose socket is gone. The caller receives the same error.        |
| `RATE_LIMITED`       | server → client | Frame rate or concurrent dispatches exceeded. `detail.retry_after_ms` is an advisory backoff hint. |
| `INTERNAL`           | either          | Unrecoverable internal error. Should be rare; always safe to reconnect.                            |

## Heartbeats

| Mechanism                    | Cadence           | Initiator | Purpose                                   |
| ---------------------------- | ----------------- | --------- | ----------------------------------------- |
| WebSocket protocol ping/pong | 30 s              | server    | TCP-level keepalive.                      |
| App-level `ping` / `pong`    | 30 s, 15 s offset | server    | Detects wedged agent event loops.         |
| `heartbeat`                  | 30 s or on change | agent     | Self-reported presence, load, and health. |

**Dead-peer timeout.** Three missed app-level pings (\~90 seconds)
cause the server to close the socket with code `1001 Going Away`.
Clients should reconnect; the SDK handles this automatically.

## Reconnect semantics

* Bounded exponential backoff: **1 s, 2 s, 4 s, 8 s, 16 s, 30 s (cap)**,
  with ±25% random jitter applied to each step.
* On every successful reconnect the backoff resets.
* If the client carries a `resume_token` in `hello`, the server
  attempts to re-bind pending dispatches:
  * **Match** → `welcome.resumed = true`,
    `welcome.replayed_dispatches` lists the re-delivered IDs.
  * **No match** → `welcome.resumed = false`. Any dispatches that
    were in flight at the time of the disconnect have already failed
    on the original owner pod with `AGENT_DISCONNECTED`, and their
    callers have received the error.
* Resume is **pod-local**. A reconnect that lands on a different pod
  (normal under scaling) cannot resume and starts clean.

## Compatibility

* `v: 1` is fixed for the life of `svantic.v1`. Breaking changes
  ship as a new subprotocol (`svantic.v2`); both will be supported
  during the transition.
* New **optional** fields may be added to existing payloads without
  a version bump. Receivers must ignore unknown fields.
* New **frame types** may be added without a version bump. Receivers
  reply with `error` / `BAD_FRAME` but must not close the
  connection; the sender downgrades.

## SDK support

If you're writing your agent with `@svantic/sdk`, none of the above
is your concern day-to-day — set `deployment_mode: 'connected'` at
registration and the SDK dials, authenticates, reconnects, and
resumes for you. Your capability handlers receive the same
`CapabilitySessionContext` they would on a hosted deployment, with
`parent_trace_id`, `parent_span_id`, and `baggage` already parsed off
the incoming `traceparent`.

This reference exists for:

* Teams writing their own client (e.g. a non-TypeScript agent).
* Debugging: reading `ws` logs from the SDK and matching them to
  protocol states.
* Compliance reviews that need the wire format documented
  externally.
