Unified spend tracking, smart routing, and budget alerts across every LLM provider you use.
If you are building with LLMs in 2026, you are probably using more than one provider. Anthropic for complex reasoning. OpenAI for broad coverage. Gemini for long context. Groq or DeepSeek for speed and cost. Maybe Ollama for local inference.
Each provider has its own billing dashboard. Each charges differently — per input token, per output token, per cached token, per batch. Some have tiered pricing. Some have free tiers with invisible limits. The pricing pages change without notice.
The result: you have no single view of what you are actually spending on LLM APIs. You find out at the end of the month when the invoices arrive. If a rogue agent burns through your budget on a Saturday, you discover it Monday morning.
This is not a hypothetical problem. If you use AI from your terminal — running Claude Code, Codex, or Gemini CLI — your spend accumulates silently across sessions. If you use multiple AI providers from yaw's built-in AI assistant, each call goes to a different provider with different per-token rates. There is no unified meter.
We built one.
Token Meter is a single service that tracks LLM API costs across all your providers in real time. It gives you four things that no individual provider dashboard offers:
Spend tracking. Every API call is logged with its token count, model, and cost. You see a unified dashboard across Anthropic, OpenAI, Google, Groq, DeepSeek, Mistral, Cohere, Ollama, Azure OpenAI, and AWS Bedrock. No more switching between ten billing pages.
Smart routing. Configure failover chains so that when one provider hits a rate limit or goes down, traffic automatically shifts to another. Set up cost-based routing to prefer cheaper models for simple tasks. Load balance across providers. Define latency thresholds.
Budget alerts. Set a daily, weekly, or monthly budget. Get notified when you hit 80%, 90%, or 100%. No more surprise invoices.
Anomaly detection. Token Meter learns your baseline usage patterns and flags outliers — a sudden spike in output tokens, an unexpected model being called, a session that is 10x more expensive than normal. You find out in minutes, not at the end of the billing cycle.
Token Meter tracks pricing for 25+ models across 10 providers, including:
| Provider | Models |
|---|---|
| Anthropic | Claude Opus 4.6, Sonnet 4.6, Sonnet 4, Haiku 4.5 |
| OpenAI | GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, GPT-4o, GPT-4o mini, o3, o4-mini |
| Gemini 3.1 Pro, Gemini 3.1 Flash Lite, Gemini 2.5 Pro, Gemini 2.5 Flash | |
| Groq | Llama 4 Scout 17B, Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B |
| DeepSeek | Chat V3, Coder V3, Reasoner R1 |
| Mistral | Mistral Large 3, Mistral Small 4, Codestral |
| Cohere | Command R, Command R+ |
| Ollama | Local models (tracked at $0 for unified logging) |
| Azure OpenAI | Same models as OpenAI, mapped to your contract pricing |
| AWS Bedrock | Same models as Anthropic, mapped to your contract pricing |
The model registry is updated continuously. When a provider launches a new model or changes pricing, Token Meter picks it up.
Add Token Meter to any MCP-compatible client — Claude Code, Cursor, or anything that speaks the Model Context Protocol. No install required. Your AI assistant gets tools to check spend, look up model pricing, and query cost history.
{
"mcpServers": {
"tokenmeter": {
"url": "https://mcp.tokenmeter.sh",
"headers": {
"Authorization": "Bearer tm_your_api_key"
}
}
}
}Or install the local MCP client:
claude mcp add tokenmeter -e TOKENMETER_API_KEY=tm_your_key -- npx @yawlabs/tokenmeter-mcpSoon this will also be available as a hosted MCP server on mcp.hosting, with zero-config deployment.
Point your provider base URL at Token Meter's gateway. Every request is proxied, logged, and routed according to your rules. No code changes beyond swapping the base URL.
OPENAI_BASE_URL=https://gateway.tokenmeter.sh/v1The gateway handles failover, load balancing, rate limit management, and cost-based routing. If OpenAI returns a 429, your request automatically retries on your configured fallback provider. If a model is cheaper on one provider for your use case, the gateway can route there.
The web dashboard at tokenmeter.sh shows your spend across all providers in one place. Filter by provider, model, time range, or session. See cost trends, per-session breakdowns, and anomaly alerts. Export data for accounting.
| Tier | Price | What You Get |
|---|---|---|
| Free | $0 | Spend summary, session cost, model pricing lookup, 7-day data retention |
| Pro | $19/mo | Analytics, budget alerts, cost trends, anomaly detection, 90-day retention |
| Gateway | $49/mo | Smart routing, automatic failover, rate limit management, fallback chains, latency reporting |
| Team | $99/mo/seat | Multi-user dashboards, per-member tracking, org-wide budgets, SSO |
The free tier is not a trial. It does not expire. If all you need is a unified spend view across providers, you can use it indefinitely.
We run LLM workloads across multiple providers for our own products. We had the same problem everyone else has: no single pane of glass for cost. We built internal tooling, realized it was a general problem, and turned it into a product.
Token Meter is not an observability platform. It is not trying to trace every prompt. It tracks one thing — what your LLM APIs cost — and does it well.
Sign up at tokenmeter.sh.
Published by Yaw Labs.
Stay up to date on LLM pricing, API changes, and developer tooling. Token Limit News is our weekly newsletter.