AEB is an open, format-neutral benchmark that scores any AI decision record against the five questions a regulator, an enterprise security review, or opposing counsel actually asks. It is run, deterministic, and the scorer is open — because an evidence benchmark you have to trust us about would fail its own hardest test.
| Decision-record format | Attribution | Policy | Approval | Integrity | Independence | Score |
|---|---|---|---|---|---|---|
| GateFrame Decision Provenance Record | 2 | 2 | 10/10 | |||
| AWS CloudTrail event | 2 | 1 | 5/10 | |||
| OpenTelemetry span (GenAI semconv) | 2 | 0 | 2/10 | |||
| Plain application log (JSON) | 1 | 0 | 1/10 |
// Every operator-controlled format fails dimension 5. CloudTrail scores best among them — it has genuine cryptographic log-file validation — but that integrity is attested by the cloud provider and the very account under audit. It cannot answer "verify without trusting the operator." That is the dimension an adversarial examination turns on.
// 0 = absent · 1 = present but operator-controlled / partial · 2 = present and independently verifiable. The rubric is format-neutral: any record that meets all five scores 10. GateFrame is not privileged by the rubric — it is built to it.
The harness, the rubric, and the deterministic scorer are open. Point it at your real decision logs and read exactly why each dimension scored what it did.
# scores the reference formats python benchmark.py # scores YOUR decision log python benchmark.py --file your_log.json
An evidence benchmark you had to take on faith would fail dimension 5 itself. Every score here is reproducible from the published scorer. Disagree with a number — read the code and tell us where it’s wrong. That is the standard we hold our own records to.
Across every operator-held logging approach in common use, the record can establish what happened to a degree — but not in a form a third party can verify without trusting the party being examined. Independent verifiability is the dimension that decides an adversarial review, and it is the one no operator-controlled log satisfies. A Decision Provenance Record closes it: a signature any examiner checks against a published key, with no involvement from the operator and none from GateFrame.
We’ll run AEB against a sample of your production AI-decision logs and walk you through exactly where the evidence gap is — and what closing dimension 5 takes. A focused pilot, not a sales call.
Request a benchmark pilot Read the source on GitHub View a live signed record