Pro Reasoning
Frontier chain-of-thought models — DeepSeek R1-0528, Qwen3 235B, Llama 3.3 70B. 128K context, transparent per-token billing.
Token Factory is the inference layer for modern AI — 60+ open-source and frontier models through one OpenAI- and Anthropic-compatible API, with transparent per-token pricing and zero infrastructure to manage.
Frontier chain-of-thought models — DeepSeek R1-0528, Qwen3 235B, Llama 3.3 70B. 128K context, transparent per-token billing.
Latency-critical inference on Language Processing Unit silicon. Up to 1,000 TPS for real-time voice, agents and interactive tools.
Your million tokens from $3, same plan.
Text, reasoning, code, vision, speech, embeddings and image generation — all through one OpenAI- and Anthropic-compatible interface. Swap providers by changing one URL.
Lyrics, stems, transcription and voice conversion in one API.
1M-token test plan, monthly windows, or pure pay-as-you-go.
On GPT OSS models — automatic prompt caching.
Works with every OpenAI and Anthropic SDK out of the box.
From your first prototype to multi-region production scale, the same API, the same billing model, the same observability stack.
Performance
Up to 1,000 tokens per second on Language Processing Unit silicon. Sub-100ms time-to-first-token, even at peak load.
Compatibility
Works with every OpenAI SDK, every Anthropic SDK, the Vercel AI SDK, LangChain, LlamaIndex, the OpenAI CLI, and Claude Code. Change one base URL — keep the rest.
Pricing
No idle GPU cost, no minimums, no surprise overage. Stripe billing, Odoo finance sync, line-item invoices for every request.
Catalog
Reasoning, code, multilingual, speech, vision, embeddings, image generation. Frontier and open-source, all under one key.
Reliability
Automatic failover across three regions, real-time health monitoring, and a public status page. Your apps stay up.
Developer experience
Python, TypeScript, Go, and Rust SDKs. Type hints, streaming, retries, and full observability baked in. Five lines to your first request.
128K context, $0.55 / $2.19 per M tokens. Now serving in production with stable latency.
8 UltraFast models now live on Language Processing Unit silicon. Sub-100ms TTFT under load.
Try every model with no card. Predictable monthly windows from $25. Pure PAYG for scale.
Sign up, get an API key, and start calling frontier models in under 60 seconds. No card required.
Run one command. Paste your API key. Token Factory auto-configures Claude Code, Codex, and every model in your workspace based on the plan you’re on — no env files, no boilerplate.
.env.local with OPENAI_BASE_URL, ANTHROPIC_BASE_URL, and TF_API_KEY in one shot.Works on macOS, Linux, and Windows. Node 18+ required.
# Paste your API key when prompted
? Enter your Token Factory API key: tf-prod-****-****
# Pick your IDE
? Detected: Claude Code · Cursor · Codex · Continue
› All of the above
# Models are routed by your plan
✓ Plan: Token Max
✓ Default model: claude-3-5-sonnet-ultra (UltraFast)
✓ Context window: 128K
✓ Wrote .env.local · .claude/settings.json · .codex/config.toml
✓ You’re ready. Run claude or codex to start.
Copy