We Found Claude at 65% Off in China — And Proved It Actually Works

May 6, 2026 · 9 min read · Engineering · View Claude models →

TL;DR — We discovered Chinese API aggregators reselling Anthropic Claude at 65% below OpenRouter through legitimate enterprise channels. Before relisting any model we ran three waves of validation: connectivity probing, billing-overhead forensics (against the open-source One-API billing formula), and a 6-scenario semantic benchmark. All three Claude models — Opus 4.7, Sonnet 4.6, Haiku 4.5 — passed. They are now live on NovAI at the most competitive prices on the global market.

Try Claude on NovAI today — from $0.50 / 1M tokens

OpenAI-compatible endpoint. Zero platform fee. Zero topup surcharge. Sign up in 30 seconds.

Create account → Try in Playground

Why Claude was missing from NovAI until now

Since NovAI launched, we have been a Chinese-model-first gateway — DeepSeek, Qwen, GLM, Doubao, MiniMax, Moonshot. Claude was conspicuously absent, despite being one of the most-requested models from our users. Why?

Because the math did not work. Anthropic's direct pricing is steep:

Claude Opus: $15 input / $75 output per 1M tokens
Claude Sonnet: $3 / $15 per 1M
Claude Haiku: $1 / $5 per 1M

OpenRouter resells at those same prices plus a 5.5% topup surcharge. If we bought from either source and added any margin, we would land higher than both. Our users would get nothing new.

The discovery: a fully-working Chinese aggregator with legitimate upstream

In April 2026 we began a systematic audit of the top Chinese API aggregators. Our target was simple: find a source whose Claude prices were materially below Anthropic direct without sacrificing the official model endpoints.

One aggregator stood out. For the purposes of this article we will call them "the upstream". Their published Claude prices were 35% of Anthropic direct retail. That is not a typo — sixty-five percent below. But cheap alone means nothing. Telegram is full of shady reseller screenshots. Before shipping a single line of integration code, we had to prove three things:

Reachability: does the upstream endpoint actually respond, from our production infrastructure?
Authenticity: are we talking to real Anthropic models, or a degraded fine-tune?
Billing transparency: is there hidden token inflation that would erase the discount?

What follows is exactly how we tested each one.

Wave 1: Connectivity & model enumeration

First smoke test — list available models, then POST a minimal "hi" to every Claude variant we cared about.

curl -sS -H "Authorization: Bearer $KEY" \
  "https://upstream.example/v1/models" \
  | jq '.data[] | select(.id | test("claude")) | .id'

# Then enumerate each candidate:
for m in claude-opus-4-7 claude-opus-4-6 claude-sonnet-4-6 \
         claude-haiku-4-5-20251001 claude-sonnet-4-5-20250929; do
  curl -sS -o /dev/null -w "$m %{http_code}\n" \
    -X POST "https://upstream.example/v1/chat/completions" \
    -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
    -d "{\"model\":\"$m\",\"messages\":[{\"role\":\"user\",\"content\":\"hi\"}],\"max_tokens\":10}"
done

Result: 6 models came back HTTP 200 with real-looking completions. Several deprecated aliases returned 403 — a good sign, because it means the upstream is not routing every request through a generic fallback, they actually maintain per-model access.

Wave 2: Billing overhead forensics

This is where most reseller audits stop. It is also where most hidden costs live.

When you send a 2-character message ("hi") to Anthropic direct, the returned prompt_tokens should be roughly 8–10. If the upstream returns 413 or 2041, someone is injecting a system prompt you did not see. And you are paying for it.

We ran the same "hi" call 6 consecutive times against each model and measured the reported input tokens. Full results:

Model	Overhead / call	Stability	Verdict
claude-opus-4-7	~130 tokens	128/129/132 (tight)	Ship
claude-sonnet-4-6	413 tokens	413 every call	Ship
claude-haiku-4-5	413 tokens	413 every call	Ship
claude-opus-4-6	413 tokens	413 every call	Ship (later)
claude-sonnet-4-5-20250929	~2041 tokens	stable but massive	Reject
claude-opus-4-5-20251101	~2050 tokens	stable but massive	Reject

The user pays for the overhead tokens at our retail price, which means a 2000-token injection would erase 30%+ of any meaningful conversation — unacceptable. We dropped those variants.

The 413-token overhead on Sonnet 4.6 and Haiku 4.5 is still noticeable, so we added explicit disclosure to the model cards. At Haiku's $0.50 / 1M retail that is still just $0.0002 per call, a negligible amount for anything beyond trivial test messages.

Key finding. claude-opus-4-7 is the sweet spot. Overhead of just 130 tokens, Anthropic Sonnet-tier pricing ($3/$15 our cost, $8/$40 retail), and a 1M-token context window — five times larger than Anthropic direct's 200K on the Opus family. This is the model we are most excited about.

Wave 3: One-API billing formula audit

After the overhead probe, we realized the upstream runs on the open-source One-API billing engine — evidenced by the X-Oneapi-Request-Id response header and a publicly readable /api/pricing endpoint.

One-API computes cost as:

cost_usd = (input_tokens * model_ratio
          + output_tokens * model_ratio * completion_ratio)
          * group_ratio * (1 / 500000)

Pulling /api/pricing gave us the raw model_ratio, completion_ratio, and the group-level discount table. Cross-multiplying against the scraped marketplace prices produced an exact match to 4 decimal places:

Model	Our cost /1M	Retail /1M	Formula-verified
claude-opus-4-7	$3.00 in · $15.00 out	$8.00 · $40.00	✓ matches
claude-sonnet-4-6	$1.05 · $5.25	$2.00 · $10.00	✓ matches
claude-haiku-4-5	$0.35 · $1.75	$0.50 · $2.50	✓ matches

Wave 4: Six-scenario semantic benchmark

Cheap and correctly-billed means nothing if quality is degraded. Final check: can the upstream-routed models actually handle real tasks?

We ran each model through six representative scenarios:

Code: implement Fibonacci with memoization (expect: clean, idiomatic Python)
Translation: Chinese → English preserving idiom and tone
Math: multi-step train-meeting problem with stepwise reasoning
Summarization: compress 4 technical sentences into 2
Role-play: professional rebuttal as senior engineer
Long-context: extract unique IPs from a 10-line log

All three models passed all six scenarios. Opus 4.7 produced the most elegant code (used functools.lru_cache instead of a manual dict). Sonnet 4.6 and Haiku 4.5 were indistinguishable from their Anthropic-direct behavior we benchmarked last quarter.

What we will not do. NovAI will not ship Claude via any aggregator that cannot answer these four audits. If the economics on a specific upstream degrade — token injection increases, model aliases shift, billing formula opacifies — we will reroute through a different channel or pull the model. Your API key and integration do not change either way, because we route behind an OpenAI-compatible shim.

Final pricing on NovAI

Model	Context	Input /1M	Output /1M	vs. OpenRouter
Claude Opus 4.7	1M	$8.00	$40.00	47% cheaper
Claude Sonnet 4.6	200K	$2.00	$10.00	33% cheaper
Claude Haiku 4.5	200K	$0.50	$2.50	50% cheaper

Prices are OpenRouter-comparable (their standard + 5.5% surcharge). NovAI charges zero platform fee and zero topup surcharge, so every USD you deposit buys USD-equivalent compute.

Quickstart — drop-in for OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_NOVAI_KEY",
    base_url="https://aiapi-pro.com/v1",
)

# Swap model name — that's it.
response = client.chat.completions.create(
    model="claude-opus-4-7",   # or claude-sonnet-4-6, claude-haiku-4-5
    messages=[{"role": "user", "content": "Explain RAG in 3 paragraphs."}],
    stream=True,
)
for chunk in response:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Start with $5 free trial credit

Enough to process 10M Haiku tokens or 600K Opus tokens. No credit card required.

We Found Claude at 65% Off in the China Market — And Proved It Actually Works

Try Claude on NovAI today — from $0.50 / 1M tokens

Why Claude was missing from NovAI until now

The discovery: a fully-working Chinese aggregator with legitimate upstream

Wave 1: Connectivity & model enumeration

Wave 2: Billing overhead forensics

Wave 3: One-API billing formula audit

Wave 4: Six-scenario semantic benchmark

Final pricing on NovAI

Quickstart — drop-in for OpenAI SDK

Start with $5 free trial credit

Further reading