How to Build an AI Chatbot with DeepSeek API — Full Tutorial (2026)

You want to build an AI chatbot. Maybe for customer support, maybe for a coding assistant, maybe just to learn. But GPT-4o costs $2.50/1M tokens and you're not made of money.

DeepSeek V4 Pro scores 90.2% on HumanEval — same tier as GPT-5.5 — and costs $0.28/1M input, $0.40/1M output. That's 1/10th the price. This tutorial shows you how to build a production-ready chatbot with it, in both Python and Node.js.

Stack: DeepSeek V4 Pro + NovAI API (OpenAI-compatible) + streaming + conversation memory. Total cost for the tutorial: under $0.01.

Prerequisites

API key: Get one free at aiapi-pro.com — $0.50 free credit, no credit card needed.
Python 3.8+ or Node.js 18+
The openai Python package or openai npm package

# Python setup
pip install openai

# Node.js setup
npm install openai

Step 1: Basic Chat (1 API Call)

This is the simplest possible chatbot — one message in, one message out. 10 lines of code.

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://aiapi-pro.com/v1",
    api_key="sk-your-novai-key-here"
)

# Simple Q&A
response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Write a Python function to reverse a linked list"}
    ]
)

print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")
print(f"Cost: ${response.usage.total_tokens * 0.00000028:.6f}")

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://aiapi-pro.com/v1',
  apiKey: 'sk-your-novai-key-here'
});

const response = await client.chat.completions.create({
  model: 'deepseek-v4-pro',
  messages: [
    { role: 'system', content: 'You are a helpful coding assistant.' },
    { role: 'user', content: 'Write a JavaScript function to reverse a linked list' }
  ]
});

console.log(response.choices[0].message.content);
console.log(`Tokens used: ${response.usage.total_tokens}`);
console.log(`Cost: $${(response.usage.total_tokens * 0.00000028).toFixed(6)}`);

    What's happening: You're calling DeepSeek V4 Pro through NovAI's OpenAI-compatible endpoint. The only differences from calling OpenAI directly are base_url and api_key. Everything else — message format, streaming, function calling — works identically.
  

Step 2: Add Conversation Memory

A real chatbot needs to remember what was said earlier. With OpenAI-compatible APIs, this is just appending to the messages array:

Python — Full Conversation Loop

from openai import OpenAI

client = OpenAI(
    base_url="https://aiapi-pro.com/v1",
    api_key="sk-your-novai-key-here"
)

messages = [
    {"role": "system", "content": "You are a helpful coding assistant. Be concise."}
]

print("Chatbot ready. Type 'quit' to exit.\n")

while True:
    user_input = input("You: ")
    if user_input.lower() == 'quit':
        break
    
    # Add user message to history
    messages.append({"role": "user", "content": user_input})
    
    # Get response
    response = client.chat.completions.create(
        model="deepseek-v4-pro",
        messages=messages,
        temperature=0.7,
        max_tokens=1024
    )
    
    assistant_reply = response.choices[0].message.content
    
    # Add assistant reply to history
    messages.append({"role": "assistant", "content": assistant_reply})
    
    print(f"Bot: {assistant_reply}")
    print(f"(tokens: {response.usage.total_tokens}, cost: ${response.usage.total_tokens * 0.00000028:.6f})\n")

Node.js — Full Conversation Loop

import OpenAI from 'openai';
import readline from 'readline';

const client = new OpenAI({
  baseURL: 'https://aiapi-pro.com/v1',
  apiKey: 'sk-your-novai-key-here'
});

const messages = [
  { role: 'system', content: 'You are a helpful coding assistant. Be concise.' }
];

const rl = readline.createInterface({
  input: process.stdin,
  output: process.stdout
});

console.log('Chatbot ready. Type "quit" to exit.\n');

const askQuestion = () => {
  rl.question('You: ', async (userInput) => {
    if (userInput.toLowerCase() === 'quit') {
      rl.close();
      return;
    }

    messages.push({ role: 'user', content: userInput });

    const response = await client.chat.completions.create({
      model: 'deepseek-v4-pro',
      messages: messages,
      temperature: 0.7,
      max_tokens: 1024
    });

    const reply = response.choices[0].message.content;
    messages.push({ role: 'assistant', content: reply });

    console.log(`Bot: ${reply}`);
    console.log(`(tokens: ${response.usage.total_tokens})\n`);

    askQuestion();
  });
};

askQuestion();

Step 3: Add Streaming (Real-Time Responses)

Users hate waiting. Streaming shows the response token-by-token as it's generated — feels instant even on long replies. With NovAI's OpenAI-compatible API, you just add stream: true:

Python — Streaming

from openai import OpenAI
import sys

client = OpenAI(
    base_url="https://aiapi-pro.com/v1",
    api_key="sk-your-novai-key-here",
    timeout=60  # Longer timeout for streaming
)

messages = [{"role": "system", "content": "You are a helpful assistant."}]

while True:
    user_input = input("\nYou: ")
    if user_input.lower() == 'quit':
        break
    
    messages.append({"role": "user", "content": user_input})
    
    # Streaming response
    stream = client.chat.completions.create(
        model="deepseek-v4-pro",
        messages=messages,
        stream=True,
        stream_options={"include_usage": True}
    )
    
    print("\nBot: ", end="", flush=True)
    collected = []
    
    for chunk in stream:
        if chunk.choices and chunk.choices[0].delta.content:
            text = chunk.choices[0].delta.content
            print(text, end="", flush=True)
            collected.append(text)
    
    full_response = "".join(collected)
    messages.append({"role": "assistant", "content": full_response})
    
    print()  # New line after streaming completes

    Why streaming matters for UX: For a 500-token response, non-streaming takes ~3 seconds to return anything. Streaming shows the first token in ~200ms. That's a 15× improvement in perceived latency — and it costs nothing extra.
  

Step 4: Production-Ready Chatbot (Web Server)

Let's build a proper web chatbot with FastAPI (Python) that handles multiple users, token counting, and rate limiting:

Python — FastAPI Backend

# server.py
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from openai import OpenAI
import time
from collections import defaultdict

app = FastAPI()
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])

client = OpenAI(
    base_url="https://aiapi-pro.com/v1",
    api_key="sk-your-novai-key-here"
)

# In production: use Redis/database instead
sessions = defaultdict(lambda: {
    "messages": [{"role": "system", "content": "You are a helpful assistant. Answer concisely."}],
    "last_request": 0,
    "total_tokens": 0
})

RATE_LIMIT_SECONDS = 1  # 1 request/second per session

class ChatRequest(BaseModel):
    session_id: str
    message: str

@app.post("/chat")
async def chat(req: ChatRequest):
    session = sessions[req.session_id]
    
    # Rate limiting
    now = time.time()
    if now - session["last_request"] < RATE_LIMIT_SECONDS:
        raise HTTPException(429, "Rate limited. Wait 1 second.")
    session["last_request"] = now
    
    # Add cost tracking to system prompt
    session["messages"].append({"role": "user", "content": req.message})
    
    # Keep context window manageable (last 20 messages)
    if len(session["messages"]) > 21:  # system + 20 messages
        session["messages"] = [
            session["messages"][0],  # Keep system prompt
            *session["messages"][-20:]  # Keep last 20
        ]
    
    try:
        response = client.chat.completions.create(
            model="deepseek-v4-pro",
            messages=session["messages"],
            temperature=0.7,
            max_tokens=1024
        )
    except Exception as e:
        raise HTTPException(500, f"API error: {str(e)}")
    
    reply = response.choices[0].message.content
    tokens = response.usage.total_tokens
    session["total_tokens"] += tokens
    session["messages"].append({"role": "assistant", "content": reply})
    
    return {
        "reply": reply,
        "tokens_used": tokens,
        "total_tokens": session["total_tokens"],
        "estimated_cost": round(tokens * 0.00000028, 6)
    }

@app.get("/stats/{session_id}")
async def stats(session_id: str):
    s = sessions[session_id]
    return {
        "messages_count": len(s["messages"]),
        "total_tokens": s["total_tokens"],
        "estimated_cost": round(s["total_tokens"] * 0.00000028, 6)
    }

# Run: uvicorn server:app --reload

HTML Frontend

<!-- index.html -->
<!DOCTYPE html>
<html>
<head>
  <title>AI Chatbot</title>
  <style>
    * { margin: 0; padding: 0; box-sizing: border-box; }
    body { font-family: -apple-system, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; }
    #chat { height: 500px; overflow-y: auto; border: 1px solid #ddd; padding: 15px; margin-bottom: 15px; border-radius: 8px; }
    .user { background: #e3f2fd; padding: 8px 12px; margin: 5px 0; border-radius: 6px; text-align: right; }
    .bot { background: #f5f5f5; padding: 8px 12px; margin: 5px 0; border-radius: 6px; }
    .cost { font-size: 12px; color: #999; margin-top: 4px; }
    #input-area { display: flex; gap: 10px; }
    #message-input { flex: 1; padding: 12px; border: 1px solid #ddd; border-radius: 6px; font-size: 16px; }
    button { padding: 12px 24px; background: #2563eb; color: white; border: none; border-radius: 6px; cursor: pointer; font-size: 16px; }
    button:hover { background: #1d4ed8; }
  </style>
</head>
<body>
  <h1>AI Chatbot (DeepSeek V4 Pro)</h1>
  <p style="color:#666;margin:10px 0">$0.28/1M input | $0.40/1M output | <span id="total-cost">$0.00</span> total</p>
  <div id="chat"></div>
  <div id="input-area">
    <input id="message-input" placeholder="Ask anything..." onkeypress="if(event.key==='Enter')send()">
    <button onclick="send()">Send</button>
  </div>
  <script>
    const SESSION_ID = 'user-' + Math.random().toString(36).slice(2);
    let totalCost = 0;
    
    function addMessage(role, text, cost) {
      const div = document.createElement('div');
      div.className = role;
      div.innerHTML = text;
      if (cost) {
        totalCost += cost;
        div.innerHTML += `<div class="cost">$${cost.toFixed(6)}</div>`;
        document.getElementById('total-cost').textContent = '$' + totalCost.toFixed(4);
      }
      document.getElementById('chat').appendChild(div);
      document.getElementById('chat').scrollTop = document.getElementById('chat').scrollHeight;
    }
    
    async function send() {
      const input = document.getElementById('message-input');
      const msg = input.value.trim();
      if (!msg) return;
      
      addMessage('user', msg);
      input.value = '';
      
      const res = await fetch('http://localhost:8000/chat', {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify({session_id: SESSION_ID, message: msg})
      });
      
      const data = await res.json();
      if (res.ok) {
        const cost = data.tokens_used * 0.00000028;  // input cost estimation
        addMessage('bot', data.reply, cost);
      } else {
        addMessage('bot', 'Error: ' + data.detail);
      }
    }
  </script>
</body>
</html>

Cost Breakdown: What Your Chatbot Actually Costs

Let's be precise about costs. DeepSeek V4 Pro via NovAI:

Usage Level	Messages/Month	Avg Tokens/Msg	Monthly Cost	GPT-4o Equivalent
Hobby project	1,000	500	$0.14	$1.25
Solo founder	10,000	800	$2.24	$20.00
Small startup	100,000	1,000	$28.00	$250.00
Production app	1,000,000	1,200	$336.00	$3,000.00

At every scale, NovAI + DeepSeek is ~9× cheaper than GPT-4o. And that's without considering the 2 permanently free models (GLM-4.6V-Flash, Qwen-Turbo) you can use for prototyping and low-priority tasks.

Production Tips

Use system prompts well: A good system prompt (role, tone, constraints) is more impactful than switching models. "You are a helpful coding assistant. Reply in English. Include code examples when relevant."
Truncate long conversations: DeepSeek V4 Pro has 128K context, but sending 50+ messages gets expensive. Keep last 20 messages — costs stay low, quality stays high.
Cache common responses: For FAQ chatbots, cache responses for identical questions. 5 lines of Redis, 90% cost reduction.
Temperature tuning: 0.0 for factual Q&A, 0.5 for balanced, 0.8 for creative. Code generation works best at 0.2.
Set max_tokens: Always cap response length. Default is model max, but most chatbot responses don't need 4096 tokens. 1024 is plenty.

Start Building — Free

$0.50 free credit + 2 permanently free models. No credit card, no Chinese phone, no platform fee.

Get your free API key → | API Docs →