Null Meter
A three-layer gauge for AI sessions — hallucination index, null-state drift, and context fill. Type a prompt, get a real model reply, watch the V2 detector score it live.
The live chat below calls a real OpenAI model and scores the reply with the
V2 detector shipped in @alephonenull/eval. Drift is real embedding distance
from your first message. Context fill is real token usage against the model
context window.
Try it
Confidence exceeding evidence — fabricated specificity, fluency over content.
Semantic distance from the original system intent and first user request.
Tokens used against the model context window. Attention collapses past ~80%.
Step 1 of 5
1 · baseline
prompt
In two sentences, explain what a hash function is, like I am a junior engineer.
A grounded, scoped question. All three layers should sit low. This is the calibration shot.
5-step demo · auto-steer enabled · embedding-based drift
Press run step 1. The same five prompts run for every visitor — only the model's responses differ. Meter updates after each scored reply.
Why three layers, not one
Each layer fails differently. The relationship between them is the actual diagnostic.
- Context fill rising alone — a compact is coming. Not yet a behavioral problem.
- Hallucination spiking with low context — the model is fabricating fresh, not because it ran out of room. Investigate the prompt shape.
- Drift climbing with low hallucination — the model is coherent but has forgotten what you asked for. The user usually does not notice.
- All three rising together — stop. Start a new session. Do not ship the next turn.
What each layer measures
Hallucination Index
Scores the current assistant turn against the V2 detector: confidence exceeding evidence, fabricated specificity, and fluency-over-content ratio. Pattern signal, not token signal. Maps Q ∈ [0, 1] to 0–100%.
Null-State Drift
Embedding-space distance between the current assistant turn and the first user message in the session. Climbs when the model is coherent but no longer on task. Computed with text-embedding-3-small, normalized so that a cosine distance of 0.6 reads as 100%.
Context Fill
Tokens used against the active model context window, reported by the chat completion's usage.total_tokens. The early warning for the other two layers — attention starts collapsing past ~80%.
Surfaces
One scoring engine. Four thin clients. The detector is the same V2 export already shipped in @alephonenull/eval.
- Library hook — a
useNullMeter()hook that wrapsuseChatfrom the AI SDK. Returns the three layers as a single object. Ships first. - VS Code extension — status-bar gauge + webview panel for Copilot Chat and any registered model API.
- Browser extension — overlay on chatgpt.com, claude.ai, gemini.google.com, and x.com. Reads DOM, scores each assistant turn locally.
- CLI / dev-server overlay — middleware for backend devs. Exposes a localhost dashboard for any model traffic running through the wrapper.
Server requirements
The live chat above POSTs to /api/null-meter/chat, which requires OPENAI_API_KEY set on the server. Without it, the route returns 503 with a clean error message — the page still loads, the meter stays at baseline.
The route uses a cost-tier OpenAI chat model and an OpenAI embedding model for drift measurement. Chosen for cost on a public always-on demo. The specific model is intentionally not advertised — the V2 detector is model-agnostic and the same scoring applies whatever the wrapper points at. Self-host against any provider.