TokenTelemetry
TokenTelemetry Docs
Features

Summarization

One-click LLM summaries of any session — a structured brief or a narrative, with caching by content hash and classified error cards.

Summarization lets you generate a human-readable summary of any session trace without reading through every event manually. You pick a backend (Claude, Codex, Gemini, Ollama, etc.), click Generate, and TokenTelemetry sends the session text to the model and displays the result.

Video walkthrough

Coming soon.

Two summary styles

Structured brief (deterministic)

The structured brief uses a fixed four-section template:

SectionWhat it contains
WhatA one-sentence description of what the session accomplished
ToolsWhich tools were called and how many times
WhyThe inferred goal or task context
NextSuggested follow-up steps or open questions

The brief is deterministic for a given session — regenerating it with the same backend and model produces the same output. It's fast and cheap because the prompt is compact.

LLM narrative

The narrative asks the model to write a free-form prose summary of the session. It's more readable and captures nuance the brief template can't, but it costs more tokens and may vary between runs.

Generate and Regenerate

  • Generate — sends the session to the configured backend and displays the result. The first call may take several seconds depending on the model and backend.
  • Regenerate — discards the cached summary and generates a fresh one. Use this if you switched models or the first result wasn't useful.

Caching by content hash

Summaries are cached by a hash of the session content. If you open the same session again, the cached summary is displayed instantly with no model call. The cache is stored in ~/.tokentelemetry/summaries.db.

Because the cache key is the content hash (not the session ID), summaries survive session record moves and renames. A session that hasn't changed will always get its cached summary back immediately, even after a restart.

Configuring a backend

Summarization requires a backend. Configure one in Settings → Summarizer (or see Configure Summarizer for the full options). Backends include:

  • Claude — uses the claude CLI; requires Claude Code to be installed and authenticated.
  • Codex — uses the codex CLI; requires Codex to be installed with an OpenAI API key or Pro subscription.
  • Gemini — uses the gemini CLI; requires Gemini CLI to be installed and authenticated.
  • Qwen — uses the qwen CLI with a DashScope API key.
  • Ollama — calls a locally-running Ollama instance; requires ollama serve to be running.
  • Antigravity — uses the Antigravity agent backend.
  • OpenAI-compatible — calls any server that speaks the /v1/chat/completions API: llama.cpp, vLLM, LM Studio, LocalAI, Groq, OpenRouter, and more.

Error cards

If a summary fails, TokenTelemetry shows a structured error card — never a raw stack trace or provider JSON blob. Each card contains:

FieldDescription
TitleShort category label (e.g. "API key invalid")
MessagePlain-English explanation
HintActionable step (e.g. "Run claude login")
Show raw errorDisclosure triangle revealing the truncated raw output for bug reports

Error categories and their hints:

CategoryTriggerHint
authHTTP 401, invalid keyBackend-specific login command or env var
too_largeHTTP 413, context overflowUse a model with a larger context window
quotaHTTP 429, rate limitWait and retry, or pick a cheaper model
modelModel not foundPick a model your account can access
timeoutRequest timed outUse a faster model or increase TT_<BACKEND>_TIMEOUT
networkConnection refusedCheck the backend is running (e.g. ollama serve)
no_outputEmpty responseTry a different model or regenerate
unknownAnything elseShows the provider's message, with the raw error available

Tips

  • Start with the structured brief — it's fast, cheap, and gives you the essentials. Switch to the narrative only when you want more prose.
  • If you get a too_large error on a long Claude Code session, try the Ollama backend with a model that has a large context window (e.g. llama3.2 with 128k context), or break the session into shorter runs.
  • The summary cache persists across restarts. If you want to force a fresh summary with the same model (e.g. after updating the model version), use Regenerate.
  • For Ollama, the default timeout is 360 seconds (6 minutes) because local inference can be slow. Override with TT_OLLAMA_TIMEOUT=<seconds>.
  • Configure Summarizer — pick a backend and set model options
  • Traces — the full session view that summaries are generated from

On this page