How to Build an AI Chatbot with DeepSeek API — Full Tutorial (2026)
You want to build an AI chatbot. Maybe for customer support, maybe for a coding assistant, maybe just to learn. But GPT-4o costs $2.50/1M tokens and you're not made of money.
DeepSeek V4 Pro scores 90.2% on HumanEval — same tier as GPT-5.5 — and costs $0.28/1M input, $0.40/1M output. That's 1/10th the price. This tutorial shows you how to build a production-ready chatbot with it, in both Python and Node.js.
Stack: DeepSeek V4 Pro + NovAI API (OpenAI-compatible) + streaming + conversation memory. Total cost for the tutorial: under $0.01.
Prerequisites
- API key: Get one free at aiapi-pro.com — $0.50 free credit, no credit card needed.
- Python 3.8+ or Node.js 18+
- The
openaiPython package oropenainpm package
# Python setup
pip install openai
# Node.js setup
npm install openai
Step 1: Basic Chat (1 API Call)
This is the simplest possible chatbot — one message in, one message out. 10 lines of code.
Python
from openai import OpenAI
client = OpenAI(
base_url="https://aiapi-pro.com/v1",
api_key="sk-your-novai-key-here"
)
# Simple Q&A
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a linked list"}
]
)
print(response.choices[0].message.content)
print(f"\nTokens used: {response.usage.total_tokens}")
print(f"Cost: ${response.usage.total_tokens * 0.00000028:.6f}")
Node.js
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://aiapi-pro.com/v1',
apiKey: 'sk-your-novai-key-here'
});
const response = await client.chat.completions.create({
model: 'deepseek-v4-pro',
messages: [
{ role: 'system', content: 'You are a helpful coding assistant.' },
{ role: 'user', content: 'Write a JavaScript function to reverse a linked list' }
]
});
console.log(response.choices[0].message.content);
console.log(`Tokens used: ${response.usage.total_tokens}`);
console.log(`Cost: $${(response.usage.total_tokens * 0.00000028).toFixed(6)}`);
base_url and api_key. Everything else — message format, streaming, function calling — works identically.
Step 2: Add Conversation Memory
A real chatbot needs to remember what was said earlier. With OpenAI-compatible APIs, this is just appending to the messages array:
Python — Full Conversation Loop
from openai import OpenAI
client = OpenAI(
base_url="https://aiapi-pro.com/v1",
api_key="sk-your-novai-key-here"
)
messages = [
{"role": "system", "content": "You are a helpful coding assistant. Be concise."}
]
print("Chatbot ready. Type 'quit' to exit.\n")
while True:
user_input = input("You: ")
if user_input.lower() == 'quit':
break
# Add user message to history
messages.append({"role": "user", "content": user_input})
# Get response
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=messages,
temperature=0.7,
max_tokens=1024
)
assistant_reply = response.choices[0].message.content
# Add assistant reply to history
messages.append({"role": "assistant", "content": assistant_reply})
print(f"Bot: {assistant_reply}")
print(f"(tokens: {response.usage.total_tokens}, cost: ${response.usage.total_tokens * 0.00000028:.6f})\n")
Node.js — Full Conversation Loop
import OpenAI from 'openai';
import readline from 'readline';
const client = new OpenAI({
baseURL: 'https://aiapi-pro.com/v1',
apiKey: 'sk-your-novai-key-here'
});
const messages = [
{ role: 'system', content: 'You are a helpful coding assistant. Be concise.' }
];
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
console.log('Chatbot ready. Type "quit" to exit.\n');
const askQuestion = () => {
rl.question('You: ', async (userInput) => {
if (userInput.toLowerCase() === 'quit') {
rl.close();
return;
}
messages.push({ role: 'user', content: userInput });
const response = await client.chat.completions.create({
model: 'deepseek-v4-pro',
messages: messages,
temperature: 0.7,
max_tokens: 1024
});
const reply = response.choices[0].message.content;
messages.push({ role: 'assistant', content: reply });
console.log(`Bot: ${reply}`);
console.log(`(tokens: ${response.usage.total_tokens})\n`);
askQuestion();
});
};
askQuestion();
Step 3: Add Streaming (Real-Time Responses)
Users hate waiting. Streaming shows the response token-by-token as it's generated — feels instant even on long replies. With NovAI's OpenAI-compatible API, you just add stream: true:
Python — Streaming
from openai import OpenAI
import sys
client = OpenAI(
base_url="https://aiapi-pro.com/v1",
api_key="sk-your-novai-key-here",
timeout=60 # Longer timeout for streaming
)
messages = [{"role": "system", "content": "You are a helpful assistant."}]
while True:
user_input = input("\nYou: ")
if user_input.lower() == 'quit':
break
messages.append({"role": "user", "content": user_input})
# Streaming response
stream = client.chat.completions.create(
model="deepseek-v4-pro",
messages=messages,
stream=True,
stream_options={"include_usage": True}
)
print("\nBot: ", end="", flush=True)
collected = []
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
text = chunk.choices[0].delta.content
print(text, end="", flush=True)
collected.append(text)
full_response = "".join(collected)
messages.append({"role": "assistant", "content": full_response})
print() # New line after streaming completes
Step 4: Production-Ready Chatbot (Web Server)
Let's build a proper web chatbot with FastAPI (Python) that handles multiple users, token counting, and rate limiting:
Python — FastAPI Backend
# server.py
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from openai import OpenAI
import time
from collections import defaultdict
app = FastAPI()
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"])
client = OpenAI(
base_url="https://aiapi-pro.com/v1",
api_key="sk-your-novai-key-here"
)
# In production: use Redis/database instead
sessions = defaultdict(lambda: {
"messages": [{"role": "system", "content": "You are a helpful assistant. Answer concisely."}],
"last_request": 0,
"total_tokens": 0
})
RATE_LIMIT_SECONDS = 1 # 1 request/second per session
class ChatRequest(BaseModel):
session_id: str
message: str
@app.post("/chat")
async def chat(req: ChatRequest):
session = sessions[req.session_id]
# Rate limiting
now = time.time()
if now - session["last_request"] < RATE_LIMIT_SECONDS:
raise HTTPException(429, "Rate limited. Wait 1 second.")
session["last_request"] = now
# Add cost tracking to system prompt
session["messages"].append({"role": "user", "content": req.message})
# Keep context window manageable (last 20 messages)
if len(session["messages"]) > 21: # system + 20 messages
session["messages"] = [
session["messages"][0], # Keep system prompt
*session["messages"][-20:] # Keep last 20
]
try:
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=session["messages"],
temperature=0.7,
max_tokens=1024
)
except Exception as e:
raise HTTPException(500, f"API error: {str(e)}")
reply = response.choices[0].message.content
tokens = response.usage.total_tokens
session["total_tokens"] += tokens
session["messages"].append({"role": "assistant", "content": reply})
return {
"reply": reply,
"tokens_used": tokens,
"total_tokens": session["total_tokens"],
"estimated_cost": round(tokens * 0.00000028, 6)
}
@app.get("/stats/{session_id}")
async def stats(session_id: str):
s = sessions[session_id]
return {
"messages_count": len(s["messages"]),
"total_tokens": s["total_tokens"],
"estimated_cost": round(s["total_tokens"] * 0.00000028, 6)
}
# Run: uvicorn server:app --reload
HTML Frontend
<!-- index.html -->
<!DOCTYPE html>
<html>
<head>
<title>AI Chatbot</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: -apple-system, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; }
#chat { height: 500px; overflow-y: auto; border: 1px solid #ddd; padding: 15px; margin-bottom: 15px; border-radius: 8px; }
.user { background: #e3f2fd; padding: 8px 12px; margin: 5px 0; border-radius: 6px; text-align: right; }
.bot { background: #f5f5f5; padding: 8px 12px; margin: 5px 0; border-radius: 6px; }
.cost { font-size: 12px; color: #999; margin-top: 4px; }
#input-area { display: flex; gap: 10px; }
#message-input { flex: 1; padding: 12px; border: 1px solid #ddd; border-radius: 6px; font-size: 16px; }
button { padding: 12px 24px; background: #2563eb; color: white; border: none; border-radius: 6px; cursor: pointer; font-size: 16px; }
button:hover { background: #1d4ed8; }
</style>
</head>
<body>
<h1>AI Chatbot (DeepSeek V4 Pro)</h1>
<p style="color:#666;margin:10px 0">$0.28/1M input | $0.40/1M output | <span id="total-cost">$0.00</span> total</p>
<div id="chat"></div>
<div id="input-area">
<input id="message-input" placeholder="Ask anything..." onkeypress="if(event.key==='Enter')send()">
<button onclick="send()">Send</button>
</div>
<script>
const SESSION_ID = 'user-' + Math.random().toString(36).slice(2);
let totalCost = 0;
function addMessage(role, text, cost) {
const div = document.createElement('div');
div.className = role;
div.innerHTML = text;
if (cost) {
totalCost += cost;
div.innerHTML += `<div class="cost">$${cost.toFixed(6)}</div>`;
document.getElementById('total-cost').textContent = '$' + totalCost.toFixed(4);
}
document.getElementById('chat').appendChild(div);
document.getElementById('chat').scrollTop = document.getElementById('chat').scrollHeight;
}
async function send() {
const input = document.getElementById('message-input');
const msg = input.value.trim();
if (!msg) return;
addMessage('user', msg);
input.value = '';
const res = await fetch('http://localhost:8000/chat', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({session_id: SESSION_ID, message: msg})
});
const data = await res.json();
if (res.ok) {
const cost = data.tokens_used * 0.00000028; // input cost estimation
addMessage('bot', data.reply, cost);
} else {
addMessage('bot', 'Error: ' + data.detail);
}
}
</script>
</body>
</html>
Cost Breakdown: What Your Chatbot Actually Costs
Let's be precise about costs. DeepSeek V4 Pro via NovAI:
| Usage Level | Messages/Month | Avg Tokens/Msg | Monthly Cost | GPT-4o Equivalent |
|---|---|---|---|---|
| Hobby project | 1,000 | 500 | $0.14 | $1.25 |
| Solo founder | 10,000 | 800 | $2.24 | $20.00 |
| Small startup | 100,000 | 1,000 | $28.00 | $250.00 |
| Production app | 1,000,000 | 1,200 | $336.00 | $3,000.00 |
At every scale, NovAI + DeepSeek is ~9× cheaper than GPT-4o. And that's without considering the 2 permanently free models (GLM-4.6V-Flash, Qwen-Turbo) you can use for prototyping and low-priority tasks.
Production Tips
- Use system prompts well: A good system prompt (role, tone, constraints) is more impactful than switching models.
"You are a helpful coding assistant. Reply in English. Include code examples when relevant." - Truncate long conversations: DeepSeek V4 Pro has 128K context, but sending 50+ messages gets expensive. Keep last 20 messages — costs stay low, quality stays high.
- Cache common responses: For FAQ chatbots, cache responses for identical questions. 5 lines of Redis, 90% cost reduction.
- Temperature tuning: 0.0 for factual Q&A, 0.5 for balanced, 0.8 for creative. Code generation works best at 0.2.
- Set max_tokens: Always cap response length. Default is model max, but most chatbot responses don't need 4096 tokens. 1024 is plenty.
Start Building — Free
$0.50 free credit + 2 permanently free models. No credit card, no Chinese phone, no platform fee.