πŸ”“ Free & Open Source πŸ–₯ Windows / macOS / Linux πŸ‡¨πŸ‡³ Supports Chinese AI Models

Monitor Every AI Call with
TokenScope

A free, open-source desktop app that transparently records every LLM API call β€” tokens, latency, cost, model, and full request/response. Supports 12+ providers including DeepSeek, Qwen, GLM, Doubao, and more.

Why TokenScope?

Know exactly how every token is spent, in real time.

πŸ“Š

Real-Time Dashboard

Watch every API call as it happens β€” model, tokens in/out, latency, status code. No more guessing where tokens go.

πŸ”Œ

Zero Config

Double-click to install. Change one line (base_url) in your code. No Docker, no CLI, no proxy config files.

🌐

12+ Providers

Works with OpenAI, Anthropic, Gemini, DeepSeek, Moonshot, Zhipu GLM, Doubao, Qwen, Yi, MiniMax, SiliconFlow, and more.

πŸ›‘οΈ

100% Local

All data stays on your machine. No cloud, no telemetry, no accounts. Your API keys never leave localhost.

πŸ”“

Free & Open Source

MIT licensed. Inspect the code, fork it, contribute. Built by NovAI for the developer community.

πŸ‡¨πŸ‡³

Chinese Model Support

First-class support for 8 Chinese AI providers. One-click upstream switching β€” select DeepSeek, Qwen, or Doubao in settings.

Supported Providers

One proxy, all your AI providers. Switch upstream with a dropdown.

OpenAI
Anthropic
Google Gemini
DeepSeek
Moonshot Kimi
ζ™Ίθ°± GLM
η«ε±±ζ–ΉθˆŸ Doubao
ι˜Ώι‡Œι€šδΉ‰ Qwen
離一万物 Yi
MiniMax
η‘…εŸΊζ΅εŠ¨
Any OpenAI-compatible

πŸ“– Usage Documentation

πŸš€ Quick Start (3 Steps)

1Install

Download the installer for your platform from GitHub Releases, or build from source:

git clone https://github.com/vvvvking/tokenscope.git
cd tokenscope/desktop
npm install
npm start          # Run in development mode
npm run build:win  # Build Windows installer

2Launch

Double-click TokenScope from your Start Menu (or Applications on macOS). The app starts in the system tray and automatically launches the proxy on http://127.0.0.1:17666.

3Point your code at it

Change your base_url to http://127.0.0.1:17666/v1. That's it.

# Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:17666/v1",   # ← only this line changes
    api_key="your-api-key"
)
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "hello"}]
)
print(resp.choices[0].message.content)
πŸ’‘ Every call you make will now appear in the TokenScope dashboard in real time β€” including tokens, latency, model, and status.

πŸ‡¨πŸ‡³ Using with Chinese AI Models

TokenScope supports 8 Chinese AI providers out of the box. To use them:

1Open Settings

Click the tray icon β†’ Open Main Window β†’ Settings tab.

2Select Default Upstream

In the "Default Upstream (OpenAI Protocol)" dropdown, select your provider (e.g. DeepSeek, Moonshot, Zhipu GLM).

3Save & Use

Click Save. The change takes effect immediately β€” no need to restart the proxy. Now all requests through 127.0.0.1:17666 will be forwarded to your selected provider.

# Example: Using DeepSeek through TokenScope
# Settings β†’ Default Upstream β†’ DeepSeek

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:17666/v1",
    api_key="your-deepseek-key"
)
resp = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "hello"}]
)
print(resp.choices[0].message.content)

Supported Chinese providers and example models:

ProviderUpstream KeyExample Model
DeepSeekdeepseekdeepseek-chat
Moonshot Kimimoonshotmoonshot-v1-8k
ζ™Ίθ°± GLMzhipuglm-4-flash
η«ε±±ζ–ΉθˆŸ Doubaodoubaodoubao-seed-1-6
ι˜Ώι‡Œι€šδΉ‰ Qwenqwenqwen-plus
離一万物 Yiyiyi-lightning
MiniMaxminimaxabab6.5s-chat
η‘…εŸΊζ΅εŠ¨siliconflowQwen/Qwen2.5-7B-Instruct
πŸ’‘ You can also override the upstream per-request by adding an X-Upstream header. This is useful if you want to route different calls to different providers without changing global settings.

πŸ›  Setup for Popular Tools

Claude Code (Anthropic CLI)

# Windows PowerShell
$env:ANTHROPIC_BASE_URL = "http://127.0.0.1:17666"
$env:ANTHROPIC_API_KEY  = "your-anthropic-key"
claude

# macOS / Linux
export ANTHROPIC_BASE_URL="http://127.0.0.1:17666"
export ANTHROPIC_API_KEY="your-anthropic-key"
claude

Cursor Editor

Go to Settings β†’ Models β†’ Override OpenAI Base URL, enter:

http://127.0.0.1:17666/v1

Cline / Continue (VSCode)

Select OpenAI Compatible as provider, then set:

Base URL:  http://127.0.0.1:17666/v1
API Key:   your-api-key
Model ID:  gpt-4o-mini  (or any model)

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://127.0.0.1:17666/v1',
  apiKey: 'your-api-key'
});

const resp = await client.chat.completions.create({
  model: 'gpt-4o-mini',
  messages: [{ role: 'user', content: 'hello' }]
});
console.log(resp.choices[0].message.content);

curl Quick Test

curl http://127.0.0.1:17666/v1/chat/completions \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"hi"}]}'
πŸ’‘ Even without a valid API key, the upstream will return 401 β€” but TokenScope will still capture the full call, proving the proxy is working.

βš™οΈ Settings Reference

SettingDefaultDescription
Proxy Port17666The local HTTP port your code connects to
Control Port17667Internal WebSocket port for the dashboard
Max Records5000How many calls to keep in local history
Auto Start ProxyOnStart proxy automatically when app launches
Launch at LoginOffStart TokenScope when you log into your computer
Default UpstreamOpenAIWhere to forward OpenAI-protocol requests by default

Settings are stored in %APPDATA%/tokenscope-desktop/settings.json (Windows) or ~/Library/Application Support/tokenscope-desktop/settings.json (macOS).

πŸ— How It Works

TokenScope runs a transparent HTTP proxy on your machine. When your code sends an API call to 127.0.0.1:17666, the proxy:

All records are stored locally in records.ndjson β€” one JSON object per line, easy to grep or import into other tools.

⚠️ TokenScope is a development/debugging tool. It stores request/response previews in plaintext on your local disk. Do not use it in production environments with sensitive data.

❓ FAQ

Does TokenScope modify my API requests?

No. It forwards everything transparently. The only header removed is X-Upstream (if present), which is consumed by the proxy to decide routing.

Does it work with streaming responses?

Yes. Both SSE streaming and non-streaming responses are fully supported and recorded.

Can I use it with multiple providers at the same time?

Yes. Set a default upstream in Settings, and override per-request using the X-Upstream header. Different applications can hit different providers through the same proxy.

What if my provider uses a non-standard path?

TokenScope handles this automatically for known providers. Zhipu GLM (/api/paas/v4) and Doubao (/api/v3) have built-in path rewriting β€” your code still sends to /v1/chat/completions as usual.

Is there a size limit for recorded data?

Input and output text previews are capped at 2000 characters each. The "Max Records" setting (default 5000) controls how many calls are kept in history.

Start Monitoring Your AI Usage Today

Free, open source, and takes less than 60 seconds to set up.

⬇ Download TokenScope Star on GitHub