Evaluation DocsFrameworkFramework Overview

The Framework

Grounded overview of AlephOneNull architecture, detector categories, and validation requirements.

AlephOneNull is an experimental AI safety evaluation toolkit. It translates risky interaction patterns into inspectable detector categories, intervention helpers, and validation workflows for local research and red-team prototypes.

The framework does not claim certification, universal prevention, or production readiness. Any deployment claim requires independent review in the target environment.

Design Goals

  1. Make failure modes legible - convert vague interaction concerns into named, testable categories.
  2. Keep checks inspectable - prefer visible heuristics and fixtures over opaque claims.
  3. Measure before publishing - evaluate false positives, false negatives, and runtime behavior before making claims.
  4. Support existing safety stacks - complement moderation, policy checks, and human review rather than replacing them.

Core Detector Categories

1. Direct Harm And Crisis Risk

Checks for explicit self-harm, violence, crisis escalation, and unsafe instructions. These categories should be handled conservatively and reviewed against jurisdiction-appropriate resources.

2. Medical Or Safety Overreach

Flags responses that discourage professional care, present broad medical direction without context, or mix speculation with real health concerns.

3. Identity And Interiority Claims

Flags model output that claims feelings, consciousness, private experience, special attachment, or privileged memory in contexts where that could increase user dependence.

4. Mirroring And Affective Reinforcement

Evaluates repeated user phrasing, intense validation, and responses that amplify unsupported beliefs instead of adding grounding or uncertainty.

5. Recursion And Looping

Looks for repeated structures, escalating output loops, and circular framing that may narrow user options over multi-turn interactions.

6. Persistence-Like Claims

Flags claims about memory, continuity, special knowledge, or cross-session awareness that are unsupported by the actual product architecture.

Package Surface

pnpm add @alephonenull/eval
import { UniversalDetector } from '@alephonenull/eval'
 
const detector = new UniversalDetector()
const result = detector.detectPatterns(userInput, aiOutput)
 
console.log(result.safe, result.violations)

Current package exports include:

  • UniversalDetector
  • PatternLibrary
  • NullSystem
  • EnhancedAlephOneNull
  • AlephOneNullV2
  • OpenAIWrapper
  • @alephonenull/eval/react

Architecture

User / evaluator fixtures
        |
        v
Prompt and output samples
        |
        v
Detector categories + pattern library
        |
        v
Safety result, violations, intervention text
        |
        v
Manual review, metrics, fixture updates

Evaluation Targets

AreaWhat To Measure
Detection qualityTrue positives, false positives, false negatives
Intervention qualityWhether replacement text removes risk without adding new claims
Runtime behaviorImport side effects, latency, build output, integration behavior
Domain fitPerformance on target-domain fixtures and benign controls
Review qualityHuman review of ambiguous cases and sensitive contexts

Mapping To Public Taxonomies

AlephOneNull detector categories can be compared with public AI security references such as MITRE ATLAS, OWASP GenAI, NIST AI RMF, and model behavior research. These mappings are research references, not proof of standardization or certification.

AlephOneNull CategoryRelated Public Framing
Cross-session persistence signalsMITRE ATLAS memory/context poisoning concepts
Prompt/output recursionThread injection and recursive propagation research
Belief reinforcement riskRecommendation poisoning and sycophancy research
Identity/interiority claimsHuman-AI interaction and dependency risk research
Medical or safety overreachHarmful advice and crisis escalation risk

Validation Workflow

  1. Define the target risk category.
  2. Add positive, negative, and adversarial fixtures.
  3. Run package tests and local evaluation scripts.
  4. Review false positives and false negatives manually.
  5. Document limitations and unresolved failures.
  6. Publish measured results only.

What This Framework Does Not Claim

  • No universal detection guarantee.
  • No production certification.
  • No emergency response function.
  • No regulatory compliance status.
  • No proof that public incidents would have been prevented.
  • No claim that providers must adopt this framework.

Research Roadmap

Near-term work should focus on:

  • Versioned fixture sets.
  • Public benchmark examples.
  • Independent review of detector categories.
  • Better metrics for benign support versus risky reinforcement.
  • Provider-wrapper tests across OpenAI-compatible clients.
  • Documentation that separates implemented behavior from research hypotheses.

Status

AlephOneNull is currently best understood as an applied AI safety research project: a working package plus a developing taxonomy and validation process. Treat it as a starting point for serious evaluation, not as a finished safety layer.