Now live — DeepSeek R1-0528 reasoning · UltraFast LPU inference available in all plans Explore → ×

Token Factory · Frontier Inference Platform

Frontier inference,
one endpoint.

Token Factory is the inference layer for modern AI — 60+ open-source and frontier models through one OpenAI- and Anthropic-compatible API, with transparent per-token pricing and zero infrastructure to manage.

$npxtoken-factory init

OpenAI & Anthropic SDK compatible
Claude Code & Codex ready
99.9% uptime SLA

/ The Model Matrix

Four flagship surfaces.
One API underneath.

Browse all 60+ models →

Flagship · Reasoning

Pro Reasoning

Frontier chain-of-thought models — DeepSeek R1-0528, Qwen3 235B, Llama 3.3 70B. 128K context, transparent per-token billing.

11 models
From $0.06 / M tokens

Flagship · Speed

UltraFast LPU

Latency-critical inference on Language Processing Unit silicon. Up to 1,000 TPS for real-time voice, agents and interactive tools.

8 models
<100ms TTFT

Audio

Music 2.6

4 models
Generation + transcription

Speech

Speech 2.8

6 models
Real-time TTS + STT

Image

Image Studio

3 models
SDXL · Flux · Vision

Embeddings

Vector Engine

5 models
1024–3072 dims

1,000,000Free tokens on signup
$3Starter plan / 24 hours
0Credit card required
60+Models unlocked

Get your Opus Million tokens today.

Your million tokens from $3, same plan.

Serving models from

DeepSeek
Mera
Meta Llama
Qwen
Mistral
OpenAI
Anthropic
OpenAI OSS
Black Forest
Moonshot
Stability

DeepSeek
Mera
Meta Llama
Qwen
Mistral
OpenAI
Anthropic
OpenAI OSS
Black Forest
Moonshot
Stability

/ The Platform

Built for production
from the first request.

Full architecture →

Unified API

One endpoint. Every modality.

Text, reasoning, code, vision, speech, embeddings and image generation — all through one OpenAI- and Anthropic-compatible interface. Swap providers by changing one URL.

# OpenAI & Anthropic compatible, drop-in
from openai import OpenAI
client = OpenAI(
  base_url=“https://api.tokenfactory.io/v1/”,
  api_key=“tf-•••••”)

Read the quickstart →

Throughput

1,000TPS
Peak tokens per second on GPT OSS 20B UltraFast

Latency

<100ms
Time-to-first-token under load

UltraFast Models

Tokens per second

GPT OSS 20B1,000
Llama 3.1 8B840
Qwen3 32B662
Llama 4 Scout594

See benchmark details →

Audio

Music 2.6 — generative audio at studio quality

Lyrics, stems, transcription and voice conversion in one API.

Explore audio models →

Pricing

From $3. No idle GPU cost.

1M-token test plan, monthly windows, or pure pay-as-you-go.

View plans →

SLA

99.9%
Uptime backed by multi-region failover

Caching

50% off cached input

On GPT OSS models — automatic prompt caching.

Compat

Drop-in OpenAI & Anthropic

Works with every OpenAI and Anthropic SDK out of the box.

/ Why Token Factory

One platform. Every workload. Zero lock-in.

From your first prototype to multi-region production scale, the same API, the same billing model, the same observability stack.

Performance

UltraFast LPU inference

Up to 1,000 tokens per second on Language Processing Unit silicon. Sub-100ms time-to-first-token, even at peak load.

Compatibility

OpenAI & Anthropic compatible, drop-in

Works with every OpenAI SDK, every Anthropic SDK, the Vercel AI SDK, LangChain, LlamaIndex, the OpenAI CLI, and Claude Code. Change one base URL — keep the rest.

Pricing

Transparent per-token billing

No idle GPU cost, no minimums, no surprise overage. Stripe billing, Odoo finance sync, line-item invoices for every request.

Catalog

60+ models, one API

Reasoning, code, multilingual, speech, vision, embeddings, image generation. Frontier and open-source, all under one key.

Reliability

99.9% SLA, multi-region

Automatic failover across three regions, real-time health monitoring, and a public status page. Your apps stay up.

Developer experience

First-class SDKs

Python, TypeScript, Go, and Rust SDKs. Type hints, streaming, retries, and full observability baked in. Five lines to your first request.

/ By the numbers

Built for teams that ship.

12B+
Tokens served daily

60+
Models in catalog

99.97%
Uptime last 90 days

14K+
Active developers

/ Latest Updates

Shipped this month.

View changelog →

Model · Reasoning
NEW

DeepSeek R1-0528 — chain-of-thought at frontier parity

128K context, $0.55 / $2.19 per M tokens. Now serving in production with stable latency.

Jun 2026DeepSeekReasoning
Infra · UltraFast
NEW

LPU silicon — 1,000 TPS on GPT OSS 20B

8 UltraFast models now live on Language Processing Unit silicon. Sub-100ms TTFT under load.

May 2026GroqThroughput
Pricing
UPDATE

Token Plan — $3 test plan with 1M tokens

Try every model with no card. Predictable monthly windows from $25. Pure PAYG for scale.

May 2026StripeOdoo

Get $5 in free credits.
Ship your first agent today.

Sign up, get an API key, and start calling frontier models in under 60 seconds. No card required.

/ One-Line Setup

Set up your IDE in 30 seconds.

Run one command. Paste your API key. Token Factory auto-configures Claude Code, Codex, and every model in your workspace based on the plan you’re on — no env files, no boilerplate.

Auto-detects your IDE — Claude Code, Cursor, Codex CLI, Continue, Cline, or any OpenAI/Anthropic-compatible client.

Plan-aware model routing — picks the right default model, context window, and rate limit for your tier (Test, Starter, Max, Enterprise).

Writes the env file for you .env.local with OPENAI_BASE_URL, ANTHROPIC_BASE_URL, and TF_API_KEY in one shot.

Safe to re-run — upgrades are idempotent. Existing keys, custom models, and overrides are preserved.

CLI · v1.2.0

Install with npx

Works on macOS, Linux, and Windows. Node 18+ required.


terminal
# Install the CLI & set up your workspace
$ npx token-factory init

# Paste your API key when prompted
? Enter your Token Factory API key: tf-prod-****-****

# Pick your IDE
? Detected: Claude Code · Cursor · Codex · Continue
All of the above

# Models are routed by your plan
Plan: Token Max
Default model: claude-3-5-sonnet-ultra (UltraFast)
Context window: 128K
Wrote .env.local · .claude/settings.json · .codex/config.toml
You’re ready. Run claude or codex to start.

~ 1.4 MB · MIT licensed

Copy

Test
Starter
Max
Enterprise