public roadmap

Roadmap

Where AlephOneNull stands today, what is being built next, and what stronger validation will require. Target windows are intent, not commitments.

Research stance

AlephOneNull is experimental research, not a certified safety product. The roadmap below is the honest near-term plan, not a commitment schedule.

Now

Q2 2026

Evaluation framework + labeled fixture corpus

The public surface today: a detector toolkit, a labeled JSONL corpus, and a scoring rubric. Reproducible from the repository.

Detector V2 implemented in @alephonenull/eval with category exports.
Public evidence pack: 10 fixture files, 95 labeled turns, 20 controls.
Scoring rubric and manifest published alongside fixtures.
reproduce.sh entry point for rerunning the corpus summary.

Q3 – Q4 2026

Corpus expansion + measured detector evaluation

The next milestones before stronger evaluation claims are made in public.

Run detector V2 against the labeled corpus and publish precision, recall, and F1 by category.
Add an independent second-rater review on a representative subset.
Build a provider-balanced evaluation set before any comparative claim.
Document concrete false-positive and false-negative examples per category.

Later

2027 +

Open evaluation set + external review

Targets that depend on the Next milestones landing first and on external participation.

Public evaluation set with held-out splits and a leaderboard format.
Multi-rater inter-annotator agreement on the public split.
Preprint covering methodology, corpus construction, and detector limits.
Third-party replication of the headline category results.

What this roadmap does not promise

The boundary between what is shipped and what is claimed.

Roadmap items are intent, not guarantees. AlephOneNull does not claim certification, clinical efficacy, legal causation, provider ranking, or production-grade safety.

Statistical metrics (precision, recall, F1) are part of the Next phase. They are not claimed today, and any public number tied to them will be published with the corpus and method that produced it.

How to follow along

The repository is the source of record.

Detector source, fixture corpus, manifest, and scoring rubric are all in the public repository. Issues and pull requests are the right surface for review and replication.

Review Evidence Technical Spec