NovAI is the first and only API platform in the world that supports OpenAI's new Responses API (/v1/responses) for Chinese AI models. This means tools like Open Cowork, OpenAI Agents SDK, and any application built on the Responses API can now use GLM-5, DeepSeek-v3.2, Qwen-Max, and more — at a fraction of GPT-4o pricing.
In early 2025, OpenAI introduced the Responses API — a new unified endpoint (POST /v1/responses) designed to replace the legacy Chat Completions API for agent-based applications. Tools like Open Cowork, the OpenAI Agents SDK, and an increasing number of AI agent frameworks have adopted this new API format.
Here's the problem: no Chinese AI provider supports the Responses API. Zhipu (GLM-5), DeepSeek, Alibaba (Qwen), MiniMax, and Moonshot all exclusively offer the legacy /v1/chat/completions endpoint. If you try to point Open Cowork or any Responses API tool at these providers, you'll get a 404 Not Found error.
This creates a massive gap: the most cost-effective AI models in the world (Chinese models are 10-100x cheaper than GPT-4o) are completely inaccessible to the fastest-growing category of AI tools.
POST /v1/responses → Zhipu API → 404 Not FoundPOST /v1/responses → NovAI → auto-translate → /chat/completions → GLM-5 / DeepSeek / Qwen → 200 OK
NovAI now provides a fully compliant /v1/responses endpoint that automatically translates between OpenAI's Responses API format and the Chat Completions format used by Chinese providers. The translation is seamless and invisible to the client application.
What NovAI handles behind the scenes:
| Responses API Feature | Translation | Status |
|---|---|---|
input (string or array) | → messages array | Supported |
instructions | → system message | Supported |
| Multimodal content (images) | → image_url format | Supported |
| Streaming (SSE events) | → Full 9-event sequence | Supported |
| Function/tool calling | → tools array | Supported |
| Reasoning content | → Thinking/CoT output | Supported |
developer role | → system role | Supported |
| Model | Provider | Input Price | Output Price | vs GPT-4o |
|---|---|---|---|---|
glm-5 | Zhipu AI | $0.004/1M | $0.004/1M | 1,250x cheaper |
deepseek-v3.2 | DeepSeek | $0.20/1M | $0.40/1M | 25x cheaper |
qwen-max | Alibaba | $1.60/1M | $6.40/1M | 3x cheaper |
qwen-plus | Alibaba | $0.40/1M | $1.20/1M | 12x cheaper |
qwen-turbo | Alibaba | $0.05/1M | $0.20/1M | 100x cheaper |
minimax-text-01 | MiniMax | $0.20/1M | $1.60/1M | 10x cheaper |
glm-4.6v | Zhipu AI | $0.40/1M | $1.20/1M | 12x cheaper |
glm-4.6v-flash | Zhipu AI | Free | Free | Infinite savings |
GLM-5 deserves special attention: Zhipu's latest flagship model includes a built-in chain-of-thought reasoning engine, comparable to GPT-4o-level quality, at $0.004 per million tokens. That's over 1,000x cheaper than OpenAI. Until now, international users had no way to use this model with modern agent tools.
Sign up at aiapi-pro.com — email only, no credit card. You'll get $0.50 free credits and a key starting with nvai-.
In any Responses API-compatible tool, set:
Base URL: https://aiapi-pro.com/v1
API Key: nvai-your-key-here
Model: glm-5
That's it. No code changes, no middleware, no proxies. The tool will send Responses API requests and receive proper Responses API responses — NovAI handles all the translation invisibly.
Open Cowork is an open-source AI desktop assistant that uses the Responses API exclusively. Here's how to connect it to Chinese models through NovAI:
OpenAIhttps://aiapi-pro.com/v1glm-5 (or deepseek-v3.2, qwen-max, etc.)
Previously: Configuring Open Cowork with Zhipu's API directly would return a 404 error because Zhipu doesn't support /v1/responses. NovAI solves this completely.
The OpenAI Agents SDK uses the Responses API internally. To route it through NovAI:
from openai import OpenAI
from agents import Agent, Runner
# Point the client at NovAI
client = OpenAI(
api_key="nvai-your-key-here",
base_url="https://aiapi-pro.com/v1"
)
agent = Agent(
name="My Agent",
instructions="You are a helpful coding assistant.",
model="deepseek-v3.2",
)
result = Runner.run_sync(agent, "Write a Python function to sort a list")
print(result.final_output)
Every agent call will use DeepSeek at $0.20/1M tokens instead of GPT-4o at $5/1M — a 25x cost reduction with comparable code quality.
POST https://aiapi-pro.com/v1/responses
Authorization: Bearer nvai-your-key
Content-Type: application/json
{
"model": "glm-5",
"input": "Explain quantum computing in simple terms.",
"stream": true,
"instructions": "You are a physics tutor for beginners.",
"temperature": 0.7,
"max_output_tokens": 2000
}
{
"id": "resp_abc123...",
"object": "response",
"created_at": 1773595519,
"status": "completed",
"model": "glm-5",
"output": [
{
"id": "msg_xyz789...",
"type": "message",
"role": "assistant",
"status": "completed",
"content": [
{
"type": "output_text",
"text": "Quantum computing uses quantum bits...",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 15,
"output_tokens": 200,
"total_tokens": 215
}
}
When "stream": true, NovAI returns the full OpenAI-compliant SSE event sequence:
event: response.created
event: response.in_progress
event: response.output_item.added
event: response.content_part.added
event: response.output_text.delta (repeated for each token)
event: response.output_text.done
event: response.content_part.done
event: response.output_item.done
event: response.completed
This is byte-for-byte compatible with OpenAI's own streaming format. Any library or framework that parses OpenAI Responses API streams will work without modification.
The input field accepts the same formats as OpenAI's Responses API:
{"model": "glm-5", "input": "Hello, world!"}
{"model": "glm-5", "input": [
{"role": "developer", "content": "You are a code reviewer."},
{"role": "user", "content": "Review this function: def add(a,b): return a+b"}
]}
{"model": "glm-4.6v", "input": [
{"role": "user", "content": [
{"type": "input_text", "text": "What's in this image?"},
{"type": "input_image", "image_url": "https://example.com/photo.jpg"}
]}
]}
{
"model": "deepseek-v3.2",
"input": "What's the weather in Tokyo?",
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}
]
}
| Platform / Tool | Direct Chinese API | Via NovAI |
|---|---|---|
| Open Cowork | 404 Error | Works |
| OpenAI Agents SDK | Not Compatible | Works |
| Custom Responses API Apps | Not Compatible | Works |
| OpenAI Python SDK (responses) | Not Compatible | Works |
| Cursor / Continue IDE | Works (chat/completions) | Works (both APIs) |
| LangChain / LlamaIndex | Works (chat/completions) | Works (both APIs) |
The AI industry is moving rapidly toward agent-based architectures. OpenAI's Responses API is becoming the standard interface for these systems. By bridging this gap, NovAI enables a new paradigm:
Build with the latest agent frameworks. Pay Chinese model prices.
An agent pipeline that would cost $50/day with GPT-4o can run for $0.04/day with GLM-5 through NovAI — without changing a single line of code in your agent framework.
NovAI supports both the legacy /v1/chat/completions endpoint and the new /v1/responses endpoint. You can use whichever your application needs, or both simultaneously. All models are available through both endpoints.
$0.50 free credits. No credit card required. Works in 30 seconds.
Get API Key Free →