A free, open-source desktop app that transparently records every LLM API call β tokens, latency, cost, model, and full request/response. Supports 12+ providers including DeepSeek, Qwen, GLM, Doubao, and more.
Know exactly how every token is spent, in real time.
Watch every API call as it happens β model, tokens in/out, latency, status code. No more guessing where tokens go.
Double-click to install. Change one line (base_url) in your code. No Docker, no CLI, no proxy config files.
Works with OpenAI, Anthropic, Gemini, DeepSeek, Moonshot, Zhipu GLM, Doubao, Qwen, Yi, MiniMax, SiliconFlow, and more.
All data stays on your machine. No cloud, no telemetry, no accounts. Your API keys never leave localhost.
MIT licensed. Inspect the code, fork it, contribute. Built by NovAI for the developer community.
First-class support for 8 Chinese AI providers. One-click upstream switching β select DeepSeek, Qwen, or Doubao in settings.
One proxy, all your AI providers. Switch upstream with a dropdown.
Download the installer for your platform from GitHub Releases, or build from source:
git clone https://github.com/vvvvking/tokenscope.git
cd tokenscope/desktop
npm install
npm start # Run in development mode
npm run build:win # Build Windows installer
Double-click TokenScope from your Start Menu (or Applications on macOS). The app starts in the system tray and automatically launches the proxy on http://127.0.0.1:17666.
Change your base_url to http://127.0.0.1:17666/v1. That's it.
# Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:17666/v1", # β only this line changes
api_key="your-api-key"
)
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "hello"}]
)
print(resp.choices[0].message.content)
TokenScope supports 8 Chinese AI providers out of the box. To use them:
Click the tray icon β Open Main Window β Settings tab.
In the "Default Upstream (OpenAI Protocol)" dropdown, select your provider (e.g. DeepSeek, Moonshot, Zhipu GLM).
Click Save. The change takes effect immediately β no need to restart the proxy. Now all requests through 127.0.0.1:17666 will be forwarded to your selected provider.
# Example: Using DeepSeek through TokenScope
# Settings β Default Upstream β DeepSeek
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:17666/v1",
api_key="your-deepseek-key"
)
resp = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "hello"}]
)
print(resp.choices[0].message.content)
Supported Chinese providers and example models:
| Provider | Upstream Key | Example Model |
|---|---|---|
| DeepSeek | deepseek | deepseek-chat |
| Moonshot Kimi | moonshot | moonshot-v1-8k |
| ζΊθ°± GLM | zhipu | glm-4-flash |
| η«ε±±ζΉθ Doubao | doubao | doubao-seed-1-6 |
| ιΏιιδΉ Qwen | qwen | qwen-plus |
| ιΆδΈδΈη© Yi | yi | yi-lightning |
| MiniMax | minimax | abab6.5s-chat |
| η‘ εΊζ΅ε¨ | siliconflow | Qwen/Qwen2.5-7B-Instruct |
X-Upstream header. This is useful if you want to route different calls to different providers without changing global settings.# Windows PowerShell
$env:ANTHROPIC_BASE_URL = "http://127.0.0.1:17666"
$env:ANTHROPIC_API_KEY = "your-anthropic-key"
claude
# macOS / Linux
export ANTHROPIC_BASE_URL="http://127.0.0.1:17666"
export ANTHROPIC_API_KEY="your-anthropic-key"
claude
Go to Settings β Models β Override OpenAI Base URL, enter:
http://127.0.0.1:17666/v1
Select OpenAI Compatible as provider, then set:
Base URL: http://127.0.0.1:17666/v1
API Key: your-api-key
Model ID: gpt-4o-mini (or any model)
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://127.0.0.1:17666/v1',
apiKey: 'your-api-key'
});
const resp = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'hello' }]
});
console.log(resp.choices[0].message.content);
curl http://127.0.0.1:17666/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'
| Setting | Default | Description |
|---|---|---|
| Proxy Port | 17666 | The local HTTP port your code connects to |
| Control Port | 17667 | Internal WebSocket port for the dashboard |
| Max Records | 5000 | How many calls to keep in local history |
| Auto Start Proxy | On | Start proxy automatically when app launches |
| Launch at Login | Off | Start TokenScope when you log into your computer |
| Default Upstream | OpenAI | Where to forward OpenAI-protocol requests by default |
Settings are stored in %APPDATA%/tokenscope-desktop/settings.json (Windows) or ~/Library/Application Support/tokenscope-desktop/settings.json (macOS).
TokenScope runs a transparent HTTP proxy on your machine. When your code sends an API call to 127.0.0.1:17666, the proxy:
/v1/chat/completions), Anthropic (/v1/messages), or Gemini (:generateContent)X-Upstream header/api/paas/v4)All records are stored locally in records.ndjson β one JSON object per line, easy to grep or import into other tools.
No. It forwards everything transparently. The only header removed is X-Upstream (if present), which is consumed by the proxy to decide routing.
Yes. Both SSE streaming and non-streaming responses are fully supported and recorded.
Yes. Set a default upstream in Settings, and override per-request using the X-Upstream header. Different applications can hit different providers through the same proxy.
TokenScope handles this automatically for known providers. Zhipu GLM (/api/paas/v4) and Doubao (/api/v3) have built-in path rewriting β your code still sends to /v1/chat/completions as usual.
Input and output text previews are capped at 2000 characters each. The "Max Records" setting (default 5000) controls how many calls are kept in history.
Free, open source, and takes less than 60 seconds to set up.