Inspectable checks for risky model interaction patterns
Flags repeated structures and escalating response loops
Measure scan latency in your own runtime and fixtures
Evaluates persistence-like and cross-session risk signals
TypeScript detectors, intervention helpers, provider wrappers, and validation guidance for AI safety red-team workflows.
The current eval-bench pack contains 95 turns, 20 controls, 75 positives, and 19 observed labels across four providers.
The current corpus includes labeled JSONL fixtures, controls, a scoring rubric, corpus metadata, benchmark tooling, and validation notes.
Review the evidenceResearch Context
Mapping experimental detectors to public AI-risk taxonomies
The project tracks overlap between local detector categories and public frameworks such as MITRE ATLAS, OWASP GenAI, and related AI security research. These mappings are research references, not certification claims.
pnpm add @alephonenull/eval