Token Meter -- Finally See What Your LLM APIs Actually Cost

The Problem: LLM Spend Is a Black Box

If you are building with LLMs in 2026, you are probably using more than one provider. Anthropic for complex reasoning. OpenAI for broad coverage. Gemini for long context. Groq or DeepSeek for speed and cost. Maybe Ollama for local inference.

Each provider has its own billing dashboard. Each charges differently — per input token, per output token, per cached token, per batch. Some have tiered pricing. Some have free tiers with invisible limits. The pricing pages change without notice.

The result: you have no single view of what you are actually spending on LLM APIs. You find out at the end of the month when the invoices arrive. If a rogue agent burns through your budget on a Saturday, you discover it Monday morning.

This is not a hypothetical problem. If you use AI from your terminal — running Claude Code, Codex, or Gemini CLI — your spend accumulates silently across sessions. If you use multiple AI providers from yaw's built-in AI assistant, each call goes to a different provider with different per-token rates. There is no unified meter.

We built one.

What Token Meter Does

Token Meter is a single service that tracks LLM API costs across all your providers in real time. It gives you four things that no individual provider dashboard offers:

Spend tracking. Every API call is logged with its token count, model, and cost. You see a unified dashboard across Anthropic, OpenAI, Google, Groq, DeepSeek, Mistral, Cohere, Ollama, Azure OpenAI, and AWS Bedrock. No more switching between ten billing pages.

Smart routing. Configure failover chains so that when one provider hits a rate limit or goes down, traffic automatically shifts to another. Set up cost-based routing to prefer cheaper models for simple tasks. Load balance across providers. Define latency thresholds.

Budget alerts. Set a daily, weekly, or monthly budget. Get notified when you hit 80%, 90%, or 100%. No more surprise invoices.

Anomaly detection. Token Meter learns your baseline usage patterns and flags outliers — a sudden spike in output tokens, an unexpected model being called, a session that is 10x more expensive than normal. You find out in minutes, not at the end of the billing cycle.

Supported Models

Token Meter tracks pricing for 25+ models across 10 providers, including:

Provider	Models
Anthropic	Claude Opus 4.6, Sonnet 4.6, Sonnet 4, Haiku 4.5
OpenAI	GPT-5.4, GPT-5.4 mini, GPT-5.4 nano, GPT-4o, GPT-4o mini, o3, o4-mini
Google	Gemini 3.1 Pro, Gemini 3.1 Flash Lite, Gemini 2.5 Pro, Gemini 2.5 Flash
Groq	Llama 4 Scout 17B, Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B
DeepSeek	Chat V3, Coder V3, Reasoner R1
Mistral	Mistral Large 3, Mistral Small 4, Codestral
Cohere	Command R, Command R+
Ollama	Local models (tracked at $0 for unified logging)
Azure OpenAI	Same models as OpenAI, mapped to your contract pricing
AWS Bedrock	Same models as Anthropic, mapped to your contract pricing

The model registry is updated continuously. When a provider launches a new model or changes pricing, Token Meter picks it up.

Three Ways to Use It

1. Remote MCP Server

Add Token Meter to any MCP-compatible client — Claude Code, Cursor, or anything that speaks the Model Context Protocol. No install required. Your AI assistant gets tools to check spend, look up model pricing, and query cost history.

{
  "mcpServers": {
    "tokenmeter": {
      "url": "https://mcp.tokenmeter.sh",
      "headers": {
        "Authorization": "Bearer tm_your_api_key"
      }
    }
  }
}

Or install the local MCP client:

claude mcp add tokenmeter -e TOKENMETER_API_KEY=tm_your_key -- npx @yawlabs/tokenmeter-mcp

Soon this will also be available as a hosted MCP server on mcp.hosting, with zero-config deployment.

2. API Gateway

Point your provider base URL at Token Meter's gateway. Every request is proxied, logged, and routed according to your rules. No code changes beyond swapping the base URL.

OPENAI_BASE_URL=https://gateway.tokenmeter.sh/v1

The gateway handles failover, load balancing, rate limit management, and cost-based routing. If OpenAI returns a 429, your request automatically retries on your configured fallback provider. If a model is cheaper on one provider for your use case, the gateway can route there.

3. Dashboard

The web dashboard at tokenmeter.sh shows your spend across all providers in one place. Filter by provider, model, time range, or session. See cost trends, per-session breakdowns, and anomaly alerts. Export data for accounting.

Pricing

Tier	Price	What You Get
Free	$0	Spend summary, session cost, model pricing lookup, 7-day data retention
Pro	$19/mo	Analytics, budget alerts, cost trends, anomaly detection, 90-day retention
Gateway	$49/mo	Smart routing, automatic failover, rate limit management, fallback chains, latency reporting
Team	$99/mo/seat	Multi-user dashboards, per-member tracking, org-wide budgets, SSO

The free tier is not a trial. It does not expire. If all you need is a unified spend view across providers, you can use it indefinitely.

Why We Built This

We run LLM workloads across multiple providers for our own products. We had the same problem everyone else has: no single pane of glass for cost. We built internal tooling, realized it was a general problem, and turned it into a product.

Token Meter is not an observability platform. It is not trying to trace every prompt. It tracks one thing — what your LLM APIs cost — and does it well.

Published by Yaw Labs.

Try Token Meter free

Unified LLM spend tracking. No credit card required.

tokenmeter.sh →

Stay up to date on LLM pricing, API changes, and developer tooling. Token Limit News is our weekly newsletter.

Token Meter — Finally See What Your LLM APIs Actually Cost