EXCLUSIVE — World's First

OpenAI Responses API for Chinese AI Models: Use GLM-5, DeepSeek & Qwen in Any Agent Tool

Published March 16, 2026 · 12 min read

TL;DR

NovAI is the first and only API platform in the world that supports OpenAI's new Responses API (/v1/responses) for Chinese AI models. This means tools like Open Cowork, OpenAI Agents SDK, and any application built on the Responses API can now use GLM-5, DeepSeek-v3.2, Qwen-Max, and more — at a fraction of GPT-4o pricing.

The Problem: Chinese Models Are Locked Out of Modern Agent Tools

In early 2025, OpenAI introduced the Responses API — a new unified endpoint (POST /v1/responses) designed to replace the legacy Chat Completions API for agent-based applications. Tools like Open Cowork, the OpenAI Agents SDK, and an increasing number of AI agent frameworks have adopted this new API format.

Here's the problem: no Chinese AI provider supports the Responses API. Zhipu (GLM-5), DeepSeek, Alibaba (Qwen), MiniMax, and Moonshot all exclusively offer the legacy /v1/chat/completions endpoint. If you try to point Open Cowork or any Responses API tool at these providers, you'll get a 404 Not Found error.

This creates a massive gap: the most cost-effective AI models in the world (Chinese models are 10-100x cheaper than GPT-4o) are completely inaccessible to the fastest-growing category of AI tools.

Before NovAI:
Open Cowork / Agents SDK → POST /v1/responses → Zhipu API → 404 Not Found

With NovAI:
Open Cowork / Agents SDK → POST /v1/responses → NovAI → auto-translate/chat/completions → GLM-5 / DeepSeek / Qwen → 200 OK

The Solution: NovAI's Responses API Compatibility Layer

NovAI now provides a fully compliant /v1/responses endpoint that automatically translates between OpenAI's Responses API format and the Chat Completions format used by Chinese providers. The translation is seamless and invisible to the client application.

What NovAI handles behind the scenes:

Responses API FeatureTranslationStatus
input (string or array)messages arraySupported
instructions→ system messageSupported
Multimodal content (images)image_url formatSupported
Streaming (SSE events)→ Full 9-event sequenceSupported
Function/tool callingtools arraySupported
Reasoning content→ Thinking/CoT outputSupported
developer rolesystem roleSupported

Which Models Are Available?

ModelProviderInput PriceOutput Pricevs GPT-4o
glm-5Zhipu AI$0.004/1M$0.004/1M1,250x cheaper
deepseek-v3.2DeepSeek$0.20/1M$0.40/1M25x cheaper
qwen-maxAlibaba$1.60/1M$6.40/1M3x cheaper
qwen-plusAlibaba$0.40/1M$1.20/1M12x cheaper
qwen-turboAlibaba$0.05/1M$0.20/1M100x cheaper
minimax-text-01MiniMax$0.20/1M$1.60/1M10x cheaper
glm-4.6vZhipu AI$0.40/1M$1.20/1M12x cheaper
glm-4.6v-flashZhipu AIFreeFreeInfinite savings

GLM-5 deserves special attention: Zhipu's latest flagship model includes a built-in chain-of-thought reasoning engine, comparable to GPT-4o-level quality, at $0.004 per million tokens. That's over 1,000x cheaper than OpenAI. Until now, international users had no way to use this model with modern agent tools.

Quick Start: 3 Steps

Step 1: Get Your NovAI API Key

Sign up at aiapi-pro.com — email only, no credit card. You'll get $0.50 free credits and a key starting with nvai-.

Step 2: Configure Your Tool

In any Responses API-compatible tool, set:

Base URL:  https://aiapi-pro.com/v1
API Key:   nvai-your-key-here
Model:     glm-5

Step 3: Start Using It

That's it. No code changes, no middleware, no proxies. The tool will send Responses API requests and receive proper Responses API responses — NovAI handles all the translation invisibly.

Setup Guide: Open Cowork

Open Cowork is an open-source AI desktop assistant that uses the Responses API exclusively. Here's how to connect it to Chinese models through NovAI:

  1. Open Open Cowork → Settings (gear icon)
  2. Set API Provider to OpenAI
  3. Set Base URL to https://aiapi-pro.com/v1
  4. Paste your NovAI API key in the API Key field
  5. Set Model to glm-5 (or deepseek-v3.2, qwen-max, etc.)
  6. Click Save and start chatting
Previously: Configuring Open Cowork with Zhipu's API directly would return a 404 error because Zhipu doesn't support /v1/responses. NovAI solves this completely.

Setup Guide: OpenAI Agents SDK (Python)

The OpenAI Agents SDK uses the Responses API internally. To route it through NovAI:

from openai import OpenAI
from agents import Agent, Runner

# Point the client at NovAI
client = OpenAI(
    api_key="nvai-your-key-here",
    base_url="https://aiapi-pro.com/v1"
)

agent = Agent(
    name="My Agent",
    instructions="You are a helpful coding assistant.",
    model="deepseek-v3.2",
)

result = Runner.run_sync(agent, "Write a Python function to sort a list")
print(result.final_output)

Every agent call will use DeepSeek at $0.20/1M tokens instead of GPT-4o at $5/1M — a 25x cost reduction with comparable code quality.

API Reference: POST /v1/responses

Request

POST https://aiapi-pro.com/v1/responses
Authorization: Bearer nvai-your-key
Content-Type: application/json

{
  "model": "glm-5",
  "input": "Explain quantum computing in simple terms.",
  "stream": true,
  "instructions": "You are a physics tutor for beginners.",
  "temperature": 0.7,
  "max_output_tokens": 2000
}

Non-Streaming Response

{
  "id": "resp_abc123...",
  "object": "response",
  "created_at": 1773595519,
  "status": "completed",
  "model": "glm-5",
  "output": [
    {
      "id": "msg_xyz789...",
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum computing uses quantum bits...",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 200,
    "total_tokens": 215
  }
}

Streaming Events

When "stream": true, NovAI returns the full OpenAI-compliant SSE event sequence:

event: response.created
event: response.in_progress
event: response.output_item.added
event: response.content_part.added
event: response.output_text.delta    (repeated for each token)
event: response.output_text.done
event: response.content_part.done
event: response.output_item.done
event: response.completed

This is byte-for-byte compatible with OpenAI's own streaming format. Any library or framework that parses OpenAI Responses API streams will work without modification.

Supported Input Formats

The input field accepts the same formats as OpenAI's Responses API:

Simple String

{"model": "glm-5", "input": "Hello, world!"}

Conversation Array

{"model": "glm-5", "input": [
  {"role": "developer", "content": "You are a code reviewer."},
  {"role": "user", "content": "Review this function: def add(a,b): return a+b"}
]}

Multimodal (Vision)

{"model": "glm-4.6v", "input": [
  {"role": "user", "content": [
    {"type": "input_text", "text": "What's in this image?"},
    {"type": "input_image", "image_url": "https://example.com/photo.jpg"}
  ]}
]}

Tool/Function Calls

{
  "model": "deepseek-v3.2",
  "input": "What's the weather in Tokyo?",
  "tools": [
    {
      "type": "function",
      "name": "get_weather",
      "description": "Get current weather for a city",
      "parameters": {
        "type": "object",
        "properties": {
          "city": {"type": "string"}
        },
        "required": ["city"]
      }
    }
  ]
}

Platform Compatibility

Platform / ToolDirect Chinese APIVia NovAI
Open Cowork404 ErrorWorks
OpenAI Agents SDKNot CompatibleWorks
Custom Responses API AppsNot CompatibleWorks
OpenAI Python SDK (responses)Not CompatibleWorks
Cursor / Continue IDEWorks (chat/completions)Works (both APIs)
LangChain / LlamaIndexWorks (chat/completions)Works (both APIs)

Why This Matters

The AI industry is moving rapidly toward agent-based architectures. OpenAI's Responses API is becoming the standard interface for these systems. By bridging this gap, NovAI enables a new paradigm:

Build with the latest agent frameworks. Pay Chinese model prices.

An agent pipeline that would cost $50/day with GPT-4o can run for $0.04/day with GLM-5 through NovAI — without changing a single line of code in your agent framework.

NovAI supports both the legacy /v1/chat/completions endpoint and the new /v1/responses endpoint. You can use whichever your application needs, or both simultaneously. All models are available through both endpoints.

Start Using Chinese Models with Responses API

$0.50 free credits. No credit card required. Works in 30 seconds.

Get API Key Free →
NovAI — The only Responses API gateway for Chinese AI models Get API Key Free → See pricing