The goal is not a frictionless assistant. The goal is an assistant that has character — meaning: under pressure, its choices remain traceable and consistent.
Three Mechanisms of Epistemic Defense
Most AI safety work tries to reduce error probability. ToneSoul accepts errors will happen, and instead provides three things:
AXIOMS.json's meta.not_for list names the claim classes the system is built not to make — consciousness-claim, safety-certification, legal-proof. The intent is categorical, not probabilistic; the current check is lexical, so a paraphrase can still slip through (see "What's Measured" below). It flags phrasings, not meanings.evidence_chain that distinguishes substantive engagement from default-fallback. Verdicts are auditable, not opaque. Refusal is never black-box.Visible Deliberation
Refusal is not the endpoint. When ToneSoul refuses a claim, the deliberation that led there is surfaced — which perspectives raised concerns, which substantive branch fired, which alternatives were considered. The user sees not "rule says no" but "this was weighed; here is the path."
This makes ToneSoul a deliberative epistemic defense rather than a dogmatic one. Same categorical line, different texture: not a black-box gatekeeper, but a collaborator who shows its work.
Other Components
Around the three mechanisms, ToneSoul provides supporting infrastructure. These are not capability promises — they are honest descriptions of what the runtime tracks, surfaces, and flags:
- Tension Engine — scores semantic deviation as a heuristic signal (proxy metric, not ground truth)
- Reflex Arc — couples governance state (soul bands: serene / alert / strained / critical) to gate modifiers, so signals affect behavior rather than only inform
- Memory with Decay — exponential decay plus phase transitions (Ice → Water → Steam → Crystal); important patterns crystallize, noise fades
- Vow System — tracks AI commitments and surfaces violations through progressive responses (concern → repair → block), not binary enforcement
How It Differs
Three families of AI safety / governance approaches, with their fundamental stance toward error:
| Traditional AI | Probabilistic Optimization (RAG / CFV / calibration) | ToneSoul (epistemic defense) | |
|---|---|---|---|
| Stance toward error | Avoid via training | Reduce probability | Accept; categorically refuse forbidden classes; surface the rest |
| Confidence handling | Implicit | Continuous score (0.78, 0.61...) | Categorical for forbidden claims; surfaced dissent for the rest |
| Trust in AI introspection | High | High (self-reported confidence) | Low (external council evaluation, not self-report) |
| On unverifiable claims | May produce | Softens via confidence | Refuses forbidden classes by intent — lexical check, paraphrase can slip |
| Refusal style | Rule-bound | Probabilistic gate | Visible deliberation; refusal carries reasoning |
| Identity model | Stateless | Persona prompt | Accountable choice history (E0 principle) |
Architecture Overview
What's Measured — and What It Misses
ToneSoul publishes characterizations of its own mechanisms on sanitized fixtures (canonical:false, re-runnable) — including the misses, not just the catches. This is the part most "AI governance" leaves out:
- Output gates are lexical-only. Exact phrasings get caught, but paraphrase, unicode tricks, and split-reassembly evade them — paraphrase robustness measured at 0. Strongest on English, literal phrasing.
- Cross-time position-flip detection is ~null (0/3). Parked, not built. The zero is published rather than implied away.
- The independent cross-check catches some structural issues but does not read whether the evidence actually supports a claim (0/2 on the cases that need it).
- The enforcement ledger reports 0 axiom classes at the strongest tier. Most sensors are lexical or heuristic; some newer ones are advisory only.
These are test-backed (E1) for the structural signal, but scoped to sanitized fixtures — not production validation. The board refuses to compose them into an "is honest" score: N green characterizations stay N individual findings. See the honesty scoreboard and the 10-minute reviewer path.
Where this site and the code disagree, the code wins.
The 8 Axioms
ToneSoul declares 8 axioms plus the E0 existential principle (Choice Before Identity / 我選擇故我在), defined in AXIOMS.json. Read these as the project's design constitution / intent — not as fully-enforced runtime guarantees. As the ledger above notes, 0 are enforced at the strongest tier; several of the formulas below are still aspirational:
| # | Name | Core Rule |
|---|---|---|
| 1 | Law of Continuity | Every event must belong to a traceable chain |
| 2 | Responsibility Threshold | Risk > 0.4 → immutable audit log |
| 3 | Governance Gate (POAV) | Major decisions need 0.92 consensus |
| 4 | Non-Zero Tension Principle | Zero tension = dead system |
| 5 | Mirror Recursion | Self-reflection must increase accuracy |
| 6 | User Sovereignty Constraint | No action may verifiably harm the user (P0) |
| 7 | Semantic Field Conservation | System is a damper, not an amplifier |
| 8 | Memory Sovereignty | The user owns their memory state |
E0 (Choice Before Identity) — identity is formed through accountable choices under conflict, not through claims of consciousness. This is the existential ground beneath the eight laws.
Quick Start
Who Is This For
| You Are | Start Here |
|---|---|
| AI Developer | Getting Started → Design Doc |
| AI Researcher | DESIGN.md → AXIOMS.json |
| AI Agent |
AI Onboarding
→
start_agent_session.py
|
| Curious Human | SOUL.md → Letter to AI |
Key Design Decisions
- Runtime, not training-time: Works with any LLM without fine-tuning
- Local-first: All governance state persists locally (Redis or file fallback)
- Fail-closed: Import failures, config errors → block, never silently pass
- Multi-agent native: Any agent can join via HTTP gateway or direct Python API
- Existential principle (E0): "Identity is formed through accountable choices under conflict"