Methodology

Deterministic claim admissibility
for scientific AI teams.

CAPAS gives review teams a deterministic gate before scientific claims enter reports, governed datasets, or fine-tuning preparation. It checks whether supplied evidence licenses a claim for controlled reuse, and returns a replayable packet — the verdict, the deterministic reason, the required evidence contract, the no-LLM marker, and fine-tune readiness — re-derivable from the same input.

Reproducible engine benchmark · pilots pending
1,238synthetic-grid engine decisions across 12 claim families
78%gated on the adversarial synthetic grid — not a production drift rate
14fine-tune readiness criteria

Synthetic benchmark — full verdict-space coverage on an adversarial grid, not a production drift rate. No production pilot has run yet. Full benchmark methodology →

Recent gate decisionsSCHEMA V3
ACCEPTstatistical_confidence: p=0.03 ≤ alpha=0.05
REWRITEdirection not independently licensed
REJECTartifact unavailable for reproducibility
HOLDRO-Crate attestation pending CLI verification
01 · ENEMY
Claim drift
A cautious source sentence becomes an over-scoped reusable claim. CAPAS catches the boundary.
02 · INPUT
Select mode
Guided builder, raw JSON, batch evaluation, or paper/text ingestion.
03 · GATE
Run gate
Schema v3 and claim-type rules return ACCEPT, REWRITE, REJECT, or HOLD.
04 · AUDIT
Inspect decision
Reason, evidence spans, provenance blockers, and fine-tune readiness.
The artifact

A real decision packet — not a description of one.

This is the literal output of the engine on a claim whose reported number re-derives correctly but whose supplied accounting evidence is internally inconsistent. Reproducible: capas_sdk.gate("financial_metric_claim", evidence).

{
  "schema_version": "capas-claim-payload-v3",
  "verdict": "REJECT",
  "reason": "reported_value matches reference within
     tolerance and period matches; OVERRIDDEN by a
     domain invariant violation: balance identity
     VIOLATED — assets 1000 != liabilities 600 +
     equity 300 (residual 100). The books do not close.",
  "required_fields": ["reported_value", "reference_value",
                      "tolerance", "metric_period_match"],
  "invariant_audit": "FLAG",
  "fine_tune_ready": false,
  "non_claim": "This decision is rule-based over supplied
     evidence fields, not an LLM judgment."
}
Decision path

An LLM may be used upstream — to extract the payload from a paper, or draft a rewrite suggestion. It is never used to determine admissibility. The verdict (ACCEPT / REWRITE / REJECT / HOLD) is produced only by versioned deterministic rules over the supplied evidence fields. The same payload always yields the same verdict, so any decision can be independently re-run and audited. The non_claim field is the machine-readable marker of this.

Text-ingested claims additionally carry source evidence spans; the hosted API wraps any packet in a signed, content-addressed certificate (capas_certstore) for tamper-evidence.

The checklist

The 14 fine-tune readiness criteria.

A claim can be ACCEPTed for a report yet still not be ready for training data. These 14 deterministic checks (verbatim from the engine) gate fine_tune_ready after an ACCEPT — they never change the verdict, only whether the claim may enter fine-tuning preparation.

verdict_accept · the claim verdict is ACCEPT
schema_clean · no schema or required-field blockers
source_backed_evidence · source-backed evidence is attached
external_review · external review is attached
semantic_alignment · claim text alignment is externally certified
witness_independence · witness independence is externally certified
provenance_sources · provenance sources / source URLs present
review_hash_verified · review hash matches the review packet
source_urls_recoverable_hashable · source URLs recoverable with matching hashes
witness_registry_resolved · witness ID resolves in the registry
ro_crate_validated · RO-Crate packet valid and hash-matched
reviewer_attestation_verified · reviewer identity / attestation verifiable
review_id_present · provenance review_id is present
witness_id_present · provenance witness_id is present

Verbatim from capas.evaluate_fine_tune_readiness. Any unmet criterion appears as a named blocker on the packet.

Executive so what

The three things that matter to a buyer.

Enemy

Claim drift

CAPAS targets the point where a cautious source sentence becomes an over-scoped reusable claim.

Control point

Gate before reuse

The gate runs before records enter fine-tuning, publication workflows, governed datasets, or downstream reports.

Audit packet

Structured output

Each output carries decision reason, evidence spans, blockers, and a non-LLM marker.

How the gate works

Evidence contracts decide licensed scope.

The gate takes a claim and an evidence package. It checks them against a claim-type evidence contract. The contract defines what evidence is required and what scope it licenses. The gate returns a verdict — deterministically.

📥

Input

Claim text + evidence fields (statistical, artifact, source URL, license, reviewer hash…)

🚧

CAPAS Gate

90+ deterministic rule functions check claim against evidence contract. No LLM in the decision path.

📤

Output

ACCEPT REWRITE REJECT HOLD

+ blockers, reviewer action, audit hash, evidence spans

Why this matters now

An auditable evidence trail for training-data governance.

Emerging AI-governance frameworks ask teams to document the quality and provenance of the data behind a claim. CAPAS produces exactly that artifact, per claim, deterministically — with no model in the decision path. It is an input to compliance and review work, not a certification of it.

EU AI Act · Art. 10

Data governance

Article 10 requires high-risk AI systems to use training, validation, and testing data that meet quality and governance criteria. CAPAS records, per claim, whether supplied evidence licenses reuse — a traceable check at the moment data enters a dataset.

NIST AI RMF

Traceability

The NIST AI Risk Management Framework emphasizes documentation and traceability across the data lifecycle. Every CAPAS decision emits a reason, evidence spans, provenance blockers, and an audit hash that can be reviewed after the fact.

No LLM judge

Deterministic by design

Unlike LLM-as-judge evaluators, CAPAS has no language model in the decision path. The same payload always yields the same verdict, so a decision can be independently re-run and audited — which a stochastic model judgment cannot guarantee.

Regulatory references are provided for context only. CAPAS does not certify compliance with the EU AI Act, the NIST AI RMF, or any other framework; it produces deterministic, auditable decision artifacts that support governance and review processes.

Two-week pilot

Measure claim drift on your own corpus.

A controlled operating test: can the organization identify which claims are licensed, which must be rewritten, which must be rejected, and which require more evidence before reuse?

Steps
1

Select one vertical corpus: AI governance, pharma evidence review, model risk, journal reproducibility, or materials R&D.

2

Convert 500 structured records into CAPAS payloads through the guided constructor, CLI, or upstream extraction adapter.

3

Run deterministic batch gating and sample 100 decisions for expert adjudication.

4

Report decision mix, reviewer agreement, false reject rate, provenance blockers, and review capacity redirected.

ACCEPT

Licensed for controlled reuse. Fine-tune obligations to clear.

REWRITE

Evidence supports a narrower claim. Returns the licensed boundary.

REJECT

Returns which evidence is missing or failing — not a silent no.

HOLD

Returns the steps: supply the missing field, verify provenance, re-gate.

Disclaimers

Required context for buyers.

·

CAPAS gates supplied evidence fields; it does not infer hidden evidence, provide legal advice, certify broad scientific truth, or replace external review.

·

The 1,238 decisions and 78% gated share are reproducible from the engine’s own benchmark (benchmarks/family_decision_mix.py) over a synthetic decision-space grid — they demonstrate full verdict-space coverage, NOT a real-world drift rate. No production pilot has been run yet; real rates require an independently adjudicated corpus.

·

Review-capacity estimates are planning assumptions and must be calibrated against the customer baseline.

·

Do not share payload URLs or exports containing confidential source text, reviewer IDs, witness IDs, licensed materials, or proprietary provenance paths without authorization. Data handling & security →