Build Cost-Effective AI Agent Pipelines with Multiple Chinese LLMs

Published March 10, 2026 · 10 min read

The smartest AI teams don't use one model for everything. They use cheap, fast models for simple tasks and reserve expensive, powerful models for complex reasoning. This "model routing" approach can cut your AI costs by 70-90% while maintaining quality.

With NovAI, you get access to 8+ Chinese AI models through a single API endpoint. This makes it trivially easy to build multi-model pipelines where each step uses the optimal model for the job.

The Architecture: Right Model, Right Task

User Input

[Qwen-Turbo — $0.05/1M] → Classify intent & extract entities

Simple query? → [Qwen-Turbo] → Direct response (80% of requests)

Complex reasoning? → [DeepSeek-v3.2 — $0.20/1M] → Deep analysis

Long document? → [Moonshot-128K — $0.60/1M] → Full context analysis

Final Response

In this architecture, 80% of requests hit the cheapest model (Qwen-Turbo at $0.05/1M tokens), and only the complex 20% escalate to more expensive models. The result: you get frontier-quality responses at an average cost of about $0.08 per million tokens.

Implementation: The Model Router

from openai import OpenAI

client = OpenAI(
    api_key="nvai-your-api-key",
    base_url="https://aiapi-pro.com/v1"
)

def classify_query(query: str) -> str:
    """Use cheap Qwen-Turbo to classify the query."""
    r = client.chat.completions.create(
        model="qwen-turbo",
        messages=[
            {"role": "system", "content": "Classify the user query into: SIMPLE, REASONING, or LONG_DOC. Reply with just the category."},
            {"role": "user", "content": query}
        ],
        max_tokens=10
    )
    return r.choices[0].message.content.strip()

def route_and_respond(query: str, context: str = "") -> str:
    """Route to the optimal model based on query type."""
    category = classify_query(query)

    model_map = {
        "SIMPLE": "qwen-turbo",        # $0.05/1M — fast & cheap
        "REASONING": "deepseek-v3.2",  # $0.20/1M — strong reasoning
        "LONG_DOC": "moonshot-v1-128k",# $0.60/1M — 128K context
    }

    model = model_map.get(category, "qwen-turbo")
    messages = [{"role": "user", "content": query}]

    if context:
        messages.insert(0, {"role": "system", "content": context})

    r = client.chat.completions.create(model=model, messages=messages)
    return r.choices[0].message.content

# Example usage
print(route_and_respond("What's 2+2?"))          # → qwen-turbo
print(route_and_respond("Prove the Riemann hypothesis")) # → deepseek

Cost Savings: Real Numbers

Let's say your application handles 100,000 API calls per day with an average of 500 input tokens and 200 output tokens per call:

StrategyDaily CostMonthly Cost
GPT-4o for everything$15.00$450
DeepSeek for everything$1.40$42
Multi-model routing (NovAI)$0.56$17

Multi-model routing on NovAI costs 96% less than GPT-4o and 60% less than using a single cheap model, because the fast classifier adds minimal overhead while ensuring you only pay for expensive models when you actually need them.

Other Pipeline Patterns

Summarize-then-analyze: Use Moonshot-128K to summarize a long document, then pass the summary to DeepSeek for deeper analysis. This gives you the benefits of long context at a fraction of the cost of running DeepSeek on the full document.

Generate-then-validate: Use Qwen-Turbo to generate a draft response, then pass it to DeepSeek to check for errors and improve quality. The validation step catches mistakes while keeping average costs low.

Multilingual pipeline: Use Qwen-Max for Chinese content (it has the best Chinese training data), DeepSeek for code and math, and route everything else to Qwen-Turbo. One API key handles all of it.

DeepSeek from $0.20/1M tokens — 10x cheaper than GPT-4o
Compare all model pricing side by side
View Full Pricing →

Build your multi-model pipeline today

One API key, 8+ models, automatic streaming. Start with $0.50 free credits.

Start Building →

Related Articles

DeepSeek Python Tutorial → DeepSeek + LangChain Guide → AI API Pricing Comparison 2026 → Cheapest DeepSeek API Access →
NovAI — AI API from $0.05/1M tokens Get Free API Key → View Pricing