Question 1

What is Token Telemetry?

Accepted Answer

Token Telemetry (also written TokenTelemetry, sometimes misspelled as 'token telementry' or 'tokentelementry') is a free, open-source, 100% local observability dashboard for AI coding agents like Claude Code, Codex, Gemini CLI, Cursor, and GitHub Copilot. It tracks tokens, cost, tool calls, and reasoning by reading the log files those agents already write — no SDK, no signup, no cloud.

Question 2

How do I track Claude Code token usage?

Accepted Answer

Install TokenTelemetry, run Claude Code normally, and open http://localhost:3000. TokenTelemetry auto-detects Claude Code sessions from ~/.claude/ logs — no instrumentation, no SDK, no config.

Question 3

How do I monitor Google Antigravity, Codex, and Gemini CLI costs?

Accepted Answer

TokenTelemetry auto-reads logs from Google Antigravity (Google's agentic coding CLI), OpenAI Codex CLI, Gemini CLI, Cursor, GitHub Copilot, Qwen CLI, OpenCode, Vibe, and Grok Build (xAI). Token counts and dollar costs appear in the local dashboard automatically.

Question 4

Is there a free tool to monitor AI coding agent token usage?

Accepted Answer

Yes — TokenTelemetry is free, open-source (MIT), and runs 100% locally. No account, no signup, no cloud.

Question 5

Does TokenTelemetry send my data to the cloud?

Accepted Answer

Your logs, sessions, prompts, tokens, and costs never leave your computer — the dashboard reads local files and serves a UI on localhost. The app does send anonymous, content-free usage stats (which pages and features you use — never your code, prompts, paths, or costs) so we know what to improve; it's on by default and you can see the exact payload and turn it off in Settings → Usage & privacy, or with DO_NOT_TRACK=1. There's also an optional GitHub update check (no usage data); disable with TT_NO_UPDATE_CHECK=1.

Question 6

How does TokenTelemetry compare to Langfuse or Helicone?

Accepted Answer

TokenTelemetry is purpose-built for AI coding agents and is zero-config — no SDK instrumentation. Langfuse and Helicone are general LLM-app observability platforms that require code changes and (typically) a cloud account.

Question 7

Which agents does it support?

Accepted Answer

Ten coding agents (Claude Code, OpenAI Codex, Gemini CLI, Cursor, GitHub Copilot, Qwen CLI, OpenCode, Vibe, Antigravity, Grok Build) plus Hermes Agent — Nous Research's autonomous agent, which gets its own dedicated dashboard at /hermes with gateway health, scheduled-job monitoring, skills + memory observability, and 38 source platforms (CLI / Telegram / Discord / Feishu / DingTalk / cron / webhook / …).

Question 8

Why does Hermes Agent get its own page?

Accepted Answer

Hermes is structurally different from coding agents — it runs across messaging platforms (Telegram / Discord / Slack / WhatsApp / Signal / Matrix / Feishu / DingTalk / WeChat), supports persistent skills and memory, delegates to subagents, and runs scheduled cron jobs. Forcing it into the same UI as Claude Code would hide most of what it does, so it gets a dedicated surface that respects its shape.

Question 9

Can I use TokenTelemetry from inside Hermes Dashboard?

Accepted Answer

Yes — there's a Hermes Dashboard plugin that registers a 'TokenTelemetry' tab inside Hermes's web UI at port 9119. It's a thin launcher: deep-link cards open the relevant TokenTelemetry page (Hermes Overview, Skills, Memory, Analytics, Projects) in a new browser tab, so you don't have to remember a second port. Install with `./scripts/install-hermes-plugin.sh` from the TokenTelemetry repo, then run `hermes dashboard`.

Backend	Requirement	Best for
Claude	`claude` CLI installed and authenticated	Best narrative quality; uses your Claude subscription
Codex	`codex` CLI with API key or Pro subscription	Good quality; uses your OpenAI account
Gemini	`gemini` CLI authenticated	Good quality; uses your Google account
Qwen	`qwen` CLI with DashScope API key	Alternative to the above
Ollama	`ollama serve` running locally	Fully local, no API costs
Antigravity	Antigravity agent installed	Useful if you're already running Antigravity
OpenAI-compatible	Any server speaking `/v1/chat/completions`	Maximum flexibility: llama.cpp, vLLM, LM Studio, Groq, OpenRouter, and more

Setting	Default	Description
Endpoint	`http://localhost:8080/v1`	The base URL of the server (must be http or https)
API key	(empty)	Bearer token for authenticated gateways; also readable from `OPENAI_COMPAT_API_KEY` env var
Max tokens	`512`	Maximum tokens in the summary response
Temperature	`0.7`	Sampling temperature
Top-p	`0.95`	Nucleus sampling probability
Top-k	`20`	Top-k sampling (ignored by strict OpenAI gateways)
Min-p	`0.0`	Min-p sampling threshold
Presence penalty	`1.5`	Repetition reduction
Repetition penalty	`1.0`	Alternative repetition reduction (llama.cpp style)
Enable thinking	Off	Send `chat_template_kwargs: {enable_thinking: true}` for Qwen3 thinking mode via vLLM

Backend	Env var	Default
Ollama	`TT_OLLAMA_TIMEOUT`	`360` (6 minutes)
Codex	`TT_CODEX_TIMEOUT`	`300` (5 minutes)
OpenAI-compatible	`TT_OPENAI_COMPAT_TIMEOUT`	`120` (2 minutes)

Configure the Summarizer

Choosing a backend

Model pickers

OpenAI-compatible backend settings

Timeout environment variables

Testing the connection

Where settings are stored

On this page