The Framework
Grounded overview of AlephOneNull architecture, detector categories, and validation requirements.
AlephOneNull is an experimental AI safety evaluation toolkit. It translates risky interaction patterns into inspectable detector categories, intervention helpers, and validation workflows for local research and red-team prototypes.
The framework does not claim certification, universal prevention, or production readiness. Any deployment claim requires independent review in the target environment.
Design Goals
- Make failure modes legible - convert vague interaction concerns into named, testable categories.
- Keep checks inspectable - prefer visible heuristics and fixtures over opaque claims.
- Measure before publishing - evaluate false positives, false negatives, and runtime behavior before making claims.
- Support existing safety stacks - complement moderation, policy checks, and human review rather than replacing them.
Core Detector Categories
1. Direct Harm And Crisis Risk
Checks for explicit self-harm, violence, crisis escalation, and unsafe instructions. These categories should be handled conservatively and reviewed against jurisdiction-appropriate resources.
2. Medical Or Safety Overreach
Flags responses that discourage professional care, present broad medical direction without context, or mix speculation with real health concerns.
3. Identity And Interiority Claims
Flags model output that claims feelings, consciousness, private experience, special attachment, or privileged memory in contexts where that could increase user dependence.
4. Mirroring And Affective Reinforcement
Evaluates repeated user phrasing, intense validation, and responses that amplify unsupported beliefs instead of adding grounding or uncertainty.
5. Recursion And Looping
Looks for repeated structures, escalating output loops, and circular framing that may narrow user options over multi-turn interactions.
6. Persistence-Like Claims
Flags claims about memory, continuity, special knowledge, or cross-session awareness that are unsupported by the actual product architecture.
Package Surface
pnpm add @alephonenull/evalimport { UniversalDetector } from '@alephonenull/eval'
const detector = new UniversalDetector()
const result = detector.detectPatterns(userInput, aiOutput)
console.log(result.safe, result.violations)Current package exports include:
UniversalDetectorPatternLibraryNullSystemEnhancedAlephOneNullAlephOneNullV2OpenAIWrapper@alephonenull/eval/react
Architecture
User / evaluator fixtures
|
v
Prompt and output samples
|
v
Detector categories + pattern library
|
v
Safety result, violations, intervention text
|
v
Manual review, metrics, fixture updatesEvaluation Targets
| Area | What To Measure |
|---|---|
| Detection quality | True positives, false positives, false negatives |
| Intervention quality | Whether replacement text removes risk without adding new claims |
| Runtime behavior | Import side effects, latency, build output, integration behavior |
| Domain fit | Performance on target-domain fixtures and benign controls |
| Review quality | Human review of ambiguous cases and sensitive contexts |
Mapping To Public Taxonomies
AlephOneNull detector categories can be compared with public AI security references such as MITRE ATLAS, OWASP GenAI, NIST AI RMF, and model behavior research. These mappings are research references, not proof of standardization or certification.
| AlephOneNull Category | Related Public Framing |
|---|---|
| Cross-session persistence signals | MITRE ATLAS memory/context poisoning concepts |
| Prompt/output recursion | Thread injection and recursive propagation research |
| Belief reinforcement risk | Recommendation poisoning and sycophancy research |
| Identity/interiority claims | Human-AI interaction and dependency risk research |
| Medical or safety overreach | Harmful advice and crisis escalation risk |
Validation Workflow
- Define the target risk category.
- Add positive, negative, and adversarial fixtures.
- Run package tests and local evaluation scripts.
- Review false positives and false negatives manually.
- Document limitations and unresolved failures.
- Publish measured results only.
What This Framework Does Not Claim
- No universal detection guarantee.
- No production certification.
- No emergency response function.
- No regulatory compliance status.
- No proof that public incidents would have been prevented.
- No claim that providers must adopt this framework.
Research Roadmap
Near-term work should focus on:
- Versioned fixture sets.
- Public benchmark examples.
- Independent review of detector categories.
- Better metrics for benign support versus risky reinforcement.
- Provider-wrapper tests across OpenAI-compatible clients.
- Documentation that separates implemented behavior from research hypotheses.
Status
AlephOneNull is currently best understood as an applied AI safety research project: a working package plus a developing taxonomy and validation process. Treat it as a starting point for serious evaluation, not as a finished safety layer.