TokenTelemetry
TokenTelemetry Docs
Configuration

Configure the Summarizer

Pick a summarization backend (Claude, Codex, Gemini, Qwen, Ollama, Antigravity, or OpenAI-compatible) and configure model settings.

The Summarizer generates readable summaries of session traces. It's optional — you only need to configure it if you want one-click summaries on the Summarization tab. Token counting, cost tracking, and traces all work without a summarizer.

Open Settings → Summarizer in the TokenTelemetry UI to configure your backend.

Choosing a backend

BackendRequirementBest for
Claudeclaude CLI installed and authenticatedBest narrative quality; uses your Claude subscription
Codexcodex CLI with API key or Pro subscriptionGood quality; uses your OpenAI account
Geminigemini CLI authenticatedGood quality; uses your Google account
Qwenqwen CLI with DashScope API keyAlternative to the above
Ollamaollama serve running locallyFully local, no API costs
AntigravityAntigravity agent installedUseful if you're already running Antigravity
OpenAI-compatibleAny server speaking /v1/chat/completionsMaximum flexibility: llama.cpp, vLLM, LM Studio, Groq, OpenRouter, and more

All backends are invoked locally. For CLI-based backends (Claude, Codex, Gemini, Qwen, Antigravity), TokenTelemetry shells out to the CLI binary you already have installed. The OpenAI-compatible backend uses stdlib urllib — no new Python dependencies.

Model pickers

For each backend, a model picker lets you choose which model to use for summarization. The picker shows the models available to the backend's CLI. You can also type a model name directly.

The selected model is stored in ~/.tokentelemetry/summarizer.json and applied to all future summarization requests.

OpenAI-compatible backend settings

The OpenAI-compatible backend has additional settings because it supports a wide variety of server configurations:

SettingDefaultDescription
Endpointhttp://localhost:8080/v1The base URL of the server (must be http or https)
API key(empty)Bearer token for authenticated gateways; also readable from OPENAI_COMPAT_API_KEY env var
Max tokens512Maximum tokens in the summary response
Temperature0.7Sampling temperature
Top-p0.95Nucleus sampling probability
Top-k20Top-k sampling (ignored by strict OpenAI gateways)
Min-p0.0Min-p sampling threshold
Presence penalty1.5Repetition reduction
Repetition penalty1.0Alternative repetition reduction (llama.cpp style)
Enable thinkingOffSend chat_template_kwargs: {enable_thinking: true} for Qwen3 thinking mode via vLLM

Non-OpenAI parameters (top_k, min_p, repetition_penalty, enable_thinking) are sent first. If the server rejects them with HTTP 400, TokenTelemetry automatically retries with a clean OpenAI-only payload so the summary still goes through. You don't need to toggle this manually.

The API key can also be set via the OPENAI_COMPAT_API_KEY environment variable. The env var takes precedence over the stored value, so secrets don't have to live in summarizer.json.

Timeout environment variables

Long traces and slow models can exceed the backend's default timeout. Override per backend with:

BackendEnv varDefault
OllamaTT_OLLAMA_TIMEOUT360 (6 minutes)
CodexTT_CODEX_TIMEOUT300 (5 minutes)
OpenAI-compatibleTT_OPENAI_COMPAT_TIMEOUT120 (2 minutes)

Set them before launching TokenTelemetry:

TT_OLLAMA_TIMEOUT=600 tokentelemetry

Or export them in your shell profile.

Testing the connection

After configuring a backend, use the Test connection button in Settings → Summarizer. It sends a short probe request to the backend and shows either a success tick or a classified error card. Fix any auth or network issues before trying to summarize a real session.

Where settings are stored

Summarizer configuration is stored in ~/.tokentelemetry/summarizer.json. You can edit it directly if needed — it's plain JSON. The file is created on first save.

On this page