Token Counter

Count tokens and estimate costs for GPT-4, Claude, Gemini, Llama, and Mistral. Context window detection and overflow alerts.

POST /api/count-tokens
AI & LLM
<50ms avg latency
API Key auth
15+ models

What It Does

The Universal LLM Token Counter provides estimated token counting for text inputs across major language model providers. Whether you're working with OpenAI's GPT models, Anthropic's Claude, Google's Gemini, or other supported models, this API delivers token count estimates with real-time cost estimation.

Built on character-ratio estimation with pricing tables, the API provides consistent token estimates with sub-50ms response times. Perfect for applications that need to manage API costs, monitor context window usage, or prevent token overflow before sending requests to LLM providers.

Key Features

  • Multi-Provider Support — OpenAI, Anthropic, Google, Mistral, and Meta
  • Real-time Cost Estimation — Get USD pricing based on current provider rates
  • Context Window Monitoring — Track usage percentage and overflow risk
  • Deterministic Results — Same input always produces same token count
  • Lightning Fast — Sub-50ms response times for all operations

Code Examples

curl -X POST https://api.atomicapis.dev/api/count-tokens \
  -H "X-RapidAPI-Proxy-Secret: YOUR_SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The quick brown fox jumps over the lazy dog. This is a sample text for token counting across multiple LLM providers.",
    "model": "gpt-4o"
  }'

Request Parameters

Name Type Required Description
text string Required The text content to count tokens for. Maximum 1MB in size.
model string Optional The model identifier (e.g., "gpt-4o", "claude-opus-4", "gemini-2.0-flash"). Defaults to "gpt-4o" if omitted.

Supported Models

gpt-4o gpt-4o-mini gpt-4-turbo gpt-3.5-turbo claude-sonnet-4-5 claude-opus-4 claude-haiku-3.5 gemini-2.0-flash gemini-1.5-pro llama-3.1-70b llama-3.1-8b mistral-large mistral-small o1 o3

Unrecognized model names are accepted but will use default estimation parameters.

Response Format

200 OK - Application/json
{
  "estimatedTokens": 23,
  "model": "gpt-4o",
  "tokenizerFamily": "o200k_base",
  "estimatedInputCost": 0.0000575,
  "estimatedOutputCost": 0.00023,
  "contextWindowSize": 128000,
  "contextUsagePercent": 0.02,
  "exceedsContext": false,
  "isExactCount": false,
  "method": "character_ratio_estimation"
}
Field Type Description
estimatedTokens integer Estimated number of tokens in the input text
model string The model used for token estimation
tokenizerFamily string The tokenizer family used (e.g., o200k_base, cl100k_base, claude)
estimatedInputCost number Estimated input cost in USD
estimatedOutputCost number Estimated output cost in USD
contextWindowSize integer Maximum context window size for the specified model
contextUsagePercent number Percentage of the model's context window used
exceedsContext boolean True if input exceeds the model's context window
isExactCount boolean Whether the token count is exact (always false for estimation-based counting)
method string Method used for counting (returns "character_ratio_estimation")

Use Cases

Cost Estimation

Preview API costs before sending requests to LLM providers. Perfect for budgeting and client billing, especially when processing large volumes of text or offering usage-based pricing to your customers.

// Show estimated cost to user before processing const estimate = await countTokens(text, model); console.log(`Estimated cost: $${estimate.estimatedInputCost}`);

Context Window Management

Monitor context window usage in real-time to prevent overflow errors. Automatically truncate or split content when approaching limits, ensuring reliable LLM interactions without unexpected failures.

// Prevent context overflow before sending if (result.exceedsContext || result.contextUsagePercent > 90) { // Split or truncate content }

API Budgeting

Implement spending limits and usage quotas in your applications. Track cumulative costs across multiple providers and enforce budget constraints before expensive operations.

// Enforce daily spending limits const dailyBudget = 100.00; if (dailySpend + estimate.estimatedInputCost > dailyBudget) { throw new Error('Daily budget exceeded'); }

Build Constraints

Character-Ratio Estimation

Uses calibrated characters-per-token ratios for each tokenizer family (e.g., 3.9 for o200k_base, 3.7 for cl100k_base, 3.5 for Claude). Averages character-based and word-based estimates for improved accuracy with sub-50ms response times.

Deterministic Counting

Token estimation is deterministic — the same input always produces the same token estimate. All operations complete in sub-50ms, including ratio calculation and cost estimation.

Pricing Tables

Maintain up-to-date pricing tables per provider. Prices vary by model and are charged per 1M tokens. Tables must be synchronized with provider pricing changes to ensure accurate cost estimation.

MCP Integration MCP Ready

What is MCP?

Model Context Protocol (MCP) allows AI assistants like Claude to call this API as a native tool during conversation. Instead of writing HTTP requests, the AI invokes the tool directly — no API keys or boilerplate needed on the client side.

Tool Details

Tool Class
TokenCounterTools
Method
CountTokens()

Description

Counts tokens and estimates costs for multiple LLM models