# CAPAS — Market, Competition & Go-to-Market Validation

*Cited, evidence-based assessment (June 2026). Fact vs conjecture labeled throughout. 40+ sources; two
load-bearing facts independently re-verified (Certified Kubernetes/Sonobuoy mark mechanics; FDA-mandated
Pinnacle 21 dataset validation). Produced by a multi-source web-research pass.*

CAPAS = open-core (Apache-2.0), open-standard engine: claim + structured evidence → ACCEPT / REWRITE /
REJECT / HOLD. It does not adjudicate truth, only whether the evidence **licenses** the claim —
deterministically, no LLM in the verdict, re-derivably. Gates structured evidence fields, never free text.
Mark/certification reserved.

## Executive summary (go/no-go)

There **is a fundable wedge — but not the one the category name implies.** "Deterministic admissibility
checking" is not an existing budget line; at the level of any *single* check it is a relabel of work
incumbents already do (Pinnacle 21 for pharma, statcheck for stats, XBRL validators for finance, OPA for
policy, Great Expectations for data). What is undefended is the **composition**: a cross-domain,
fail-closed, re-derivable verdict engine + a **reserved certification mark** that lets a buyer say "this
passed an independent admissibility gate."

The single beachhead with **budget + urgency + a hard dated trigger NOW** is **regulated submissions —
specifically pharma trial-statistics admissibility**, riding the *existing mandatory* FDA/PMDA
dataset-validation motion that Pinnacle 21 already owns. CAPAS's slot there: **statistical-claim
admissibility beyond structural CDISC conformance** (gating p-values / effect sizes / multiplicity /
calibration invariants that P21 does not deeply check).

The moat (trusted re-derivable reference standard + reserved mark in a regulated niche) **is defensible in
principle** — **SOC 2** and **Certified Kubernetes** prove the pattern — but only if mark/standard
governance is **separated from the commercial entity early**. Conflating "owns engine + owns mark + sells
product" is the exact configuration that forked every relicensed open-core project (OpenTofu, Valkey,
OpenSearch).

**Verdict: GO, narrowly.** Fund one vertical wedge (pharma statistical admissibility); pre-commit the
mark to neutral governance; treat AI-training-data and quantum as high-ceiling *future* markets, not the
beachhead. Thinnest evidence is where the largest TAM is (AI/RAG); hardest trigger is where the smallest
greenfield is (pharma).

## Thread 1 — Market gap, buyer, competition

Real spend exists in adjacent categories, captured by incumbents (FACT): Great Expectations (~$65M),
Soda (~$28–41M, acquired NannyML Jun 2025); research-integrity land-grab ("50+ vendors", Elsevier
investing tens of millions, Wiley paper-mill detector flagged 10–13% across 270 journals); **Pinnacle 21
(Certara) is the same platform FDA/PMDA use** — sponsors MUST validate SDTM/ADaM/define.xml before
submission (verified: FDA publishes the validation rules); Workiva XBRL ~$100k–$300k/yr midcap, $1M+ large.

Buyer ranking (budget × urgency × trigger):
1. **Regulated submissions** (pharma stats + financial reporting) — hard non-optional trigger; uses
   Pinnacle 21 / XBRL validators; pays enterprise-scale (FACT). Gap = *statistical-claim* admissibility
   beyond structural conformance (incumbent owns the structural slot).
2. **Journals / data editors** — rising (paper-mill crisis); uses **statcheck** (in peer review at
   *Psychological Science*/*JESP*, cuts error rates — Nuijten & Wicherts 2024), GRIM/GRIMMER/SPRITE,
   **SciScore** ($39.99/$49.99 per 3 credits), Ripeta, Penelope.ai; proven WTP but low budget + 50+
   competitors.
3. **AI labs / enterprises** gating third-party claims into training/RAG/reports — mostly nothing
   deterministic today; budgets huge, trigger soft (FACT). Highest ceiling, most greenfield, hardest sale.

"Admissibility" is **differentiated as a frame/architecture** (nobody sells a single deterministic,
re-derivable, LLM-free, cross-domain claim→evidence gate with a reserved mark) but **a relabel at the
single-check level** (statcheck, GRIM, Great Expectations, and **OPA/Styra** — the architectural twin —
already exist). Defensible story = composition + fail-closed verdict + reserved mark, not the rungs.
*Cautionary FACT:* standalone open-core policy-engine value capture struggled — **Styra acqui-hired by
Apple (Aug 2025), enterprise product wound into upstream OPA; the standard survived, the company's capture
did not.* (Prophy was mis-grouped in the brief — it is reviewer-matching, not numeric integrity.)

*Honest limits:* no clean "research-integrity software market size"; $15–18B "data integrity" figures are
enterprise-pipeline, NOT CAPAS TAM. SciScore/Ripeta ARR unobtainable. The AI-lab trigger is the
weakest-evidenced claim.

## Thread 2 — Open-core / open-standard GTM

Two value-capture archetypes (FACT): (a) free standard + paid platform (OPA, Sigstore, OpenTelemetry);
(b) **open standard + restricted right-to-attest** (SOC 2 — only AICPA CPA firms may issue, audits
~$20k–$150k+/yr; Certified Kubernetes). **CAPAS's reserved-mark plan is archetype (b).**

**Certified Kubernetes is the directly copyable mechanic (verified):** code open; the mark may be used
only by passing a conformance test run with the *same open tool (Sonobuoy)* users run themselves;
submitted by PR + community review; **yearly re-certification**; mark owned by the Linux Foundation. Maps
~1:1 to CAPAS ("open engine + run-it-yourself determinism + reserved mark"). CAPAS's determinism is
*better* suited than SOC 2 (no human auditor → Sonobuoy-style self-certification). But CAPAS lacks SOC 2's
legal moat (CPA licensure); its gatekeeping rests on certification-mark trademark law — enforceable but
weaker (CONJECTURE).

**Failure modes (FACT — the strongest cautionary corpus):** MongoDB (AGPL→SSPL, OSI ruled non-open),
Elastic (→SSPL→AWS OpenSearch fork→reverted to AGPL 2024), HashiCorp (→BSL→OpenTofu/OpenBao), Redis
(→SSPL→Valkey, ~83% of large users testing within a year→reverted 2025, "bridges burned"). Lessons:
relicensing the core to grab value is the #1 trust-killer; it reliably triggers a neutral-foundation fork;
reversals don't restore trust; **separate value capture from standard ownership** so the company can be
acquired (Styra→Apple) without taking the standard down. **Highest-leverage de-risking move: pre-commit
the CAPAS mark to a neutral foundation / irrevocable certification-mark charter BEFORE adoption.**

## Thread 3 — Quantum-advantage claim refutation

Demand for an independent "classically-reproducible-at-claimed-depth" defeater is **factually
demonstrated, but filled by academia + one government program, not a product** (FACT). Every
first-gen advantage claim eroded: Google Sycamore (2019) classically sampled (Pan & Zhang, PRL 2022) →
seconds by 2023–24 → *Leapfrogging Sycamore* (arXiv:2406.18889). **IBM "Utility" 127-qubit (Nature 618,
2023)** neutralized within weeks by Tindall/Flatiron (tensor-network + belief propagation, *more accurate
than the device on a laptop*, PRX Quantum 5 010308) and Begušić & Chan (sparse Pauli dynamics, one laptop
core). Rebuttals are scattered with **no unified verdict** — exactly the gap a standardized defeater
certificate compresses. Newest claims contested in real time: Google **Willow** (Dec 2024) rests on
extrapolation not direct verification; **"Quantum Echoes"/OTOC** (Oct 2025) "first verifiable advantage"
disputed (OTOC complexity class not understood; classical counter arXiv:2510.06324).

Consumers: **DARPA QBI** is a funded, explicitly skeptical third-party IV&V program ("Our opening position
is skepticism"; 11 of ~18 advanced to Stage B Nov 2025; IBM advanced Nov 6 2025) — *the manual
institutional version of what CAPAS automates*. Investors real but soft. Journals: gap real, adoption
unproven. *Honest limit:* the defeater is itself frontier research — CAPAS can encode *known* failure
modes but not *discover* a novel classical algorithm (GIGO ceiling); spoofing is often partial → a graded
**HOLD** is the honest verdict (fits fail-closed). For newest OTOC claims there may be no settled oracle →
**HOLD, not refutation.** DARPA fills the skeptic niche free; the likely *paying* customer is a **vendor
wanting a defensible "passed an independent admissibility check" certificate** (offense-as-defense).

## Thread 4 — AI training-data / RAG admissibility & provenance (2026)

Pain is real and acute (FACT): Data Provenance Initiative found license omission >70%, license error >50%
across 1,800+ datasets; RAG hallucination 15–30%+; **OWASP LLM04:2025 Data & Model Poisoning** spans
pre-training, fine-tuning, AND RAG; **Anthropic + UK AISI + Turing (Oct 2025, arXiv:2510.07192): ~250
malicious documents** suffice to backdoor models 600M–13B (near-constant regardless of scale → pre-ingestion
screening matters more). Regulation in force (FACT — strongest part): **EU AI Act GPAI obligations applied
Aug 2 2025**; providers must publish a training-content summary on the EC's **mandatory template (Jul 24
2025)**; **AI Office enforcement Aug 2 2026**, fines up to **€15M or 3% turnover**; **C2PA** v2.3 +
Conformance Program late 2025, **CISA endorsed Content Credentials Jan 2025**.

*Critical limit (FACT):* the EU AI Act mandates **disclosure, not deterministic admissibility
verification** (the AI Office "will not perform content-level audits") — so mandated demand for a
*verification gate* is softer than the headline. C2PA is creation-time media provenance (metadata often
stripped), not a dataset/claim-admissibility standard. Tools fragmented (provenance audits, datasheets,
contamination detection, poisoning defenses). OWASP recommends **"data validation gates before ingestion"
as a *practice*, not a product** — CAPAS's fail-closed, downgrade-only design maps almost exactly onto it.
Enterprise AI Governance & Compliance market ~$2.5B (2025)→$3.4B (2026), ~39% CAGR (estimate-grade), but
**no consolidated buyer, no budget line called "admissibility gate," no standard.** Position as a
**feature within AI-governance compliance + OWASP LLM04 supply-chain + RAG grounding**, not a market of
its own — yet.

## Synthesis — founder go/no-go

- **Fundable wedge? Yes, narrow.** Not "the admissibility market" (no budget line) but a specific
  deterministic check a regulated buyer already pays adjacent work on, wrapped in a reserved mark whose
  precedent (SOC 2, Certified Kubernetes) is proven.
- **Beachhead NOW: pharma trial-statistics admissibility, sold into the FDA/PMDA dataset-validation
  motion.** Budget FACT (sponsors/CROs license P21); trigger FACT (FDA requires validated datasets, the
  deadline is the submission); greenfield CONJECTURE-well-grounded (P21 checks *structural* CDISC
  conformance, not deep *statistical-claim* admissibility — CAPAS's slot: "the evidence licenses the
  reported statistic," re-derivably). Not journals (low budget, 50+ competitors, statcheck already
  deterministic); not AI/RAG (softest trigger); not quantum (research-grade defeaters, DARPA free).
- **Moat defensible in principle (FACT precedent), conditional in practice (CONJECTURE).** It is the
  trust/certification position, NOT the rungs (every check copyable). Self-runnable conformance
  (Sonobuoy-style) is cheaper than SOC 2's human audit. **Fails if the mark + core stay inside the
  commercial entity** — the configuration that forked every relicensed project.
- **Highest-leverage first moves:** (1) build the pharma statistical-admissibility wedge as a thin
  adapter *beside* Pinnacle 21, land one CRO/sponsor design partner with an imminent submission; (2)
  **pre-commit the CAPAS mark to neutral governance before adoption** — the single cheapest, highest-
  leverage de-risking act; (3) ship a **Sonobuoy-equivalent run-it-yourself conformance harness** —
  determinism is the structural advantage; make self-certification the distribution mechanism.
- **Thin evidence (explicit):** thinnest = AI/RAG "admissibility gate" *as a market* (pain + regulation
  FACT; consolidated buyer + budget CONJECTURE); soft = quantum *paying* demand; unobtainable = clean
  research-integrity TAM + SciScore/Ripeta ARR + CAPAS pricing power vs P21; forward-dated/uncertain =
  an OpenTelemetry "2026 graduation" dateline. Strongest primary-grade facts = FDA-mandated dataset
  validation, EU AI Act dates/template/fines, the open-core relicense→fork record, the IBM/Nature
  classical-rebuttal episode, the Certified Kubernetes / SOC 2 mark-governance mechanics.

*(Full source list — 40+ URLs across the four threads — is in the research transcript.)*