EXHIBIT 01 · PHYSICAL AI · THE PROOF LAYER

A demo isn't proof your robot's AI is safe to deploy.

WorldFlux turns every AI test into signed, verifiable evidence your customers, insurers, and regulators can open and trust, without you ever handing over your models, data, or keys.

signed · verifiable · expiring
EXHIBIT 01 · CHAIN OF CUSTODY
$5T
humanoid-robot market by 2050[1]
Art. 11
EU AI Act makes evidence legally required[4]
>$1B
AI-governance spend by 2030 (Gartner)[3]

THE GAP

Impressive demos. No way to prove they hold up.

The hardest part of physical AI isn't the demo. It's proving the demo will work in the real world. Today, teams ship a polished video and a spreadsheet of scores, and everyone downstream takes it on faith. The people who write the check, underwrite the risk, or sign off on safety can't independently verify anything.

This isn't our claim. It's the consensus among the investors funding this wave:

The demos are real. … But generalized, reliable deployment in the physical world? That's still ahead of us.[6]
Bessemer Venture Partners
Reliability isn't promised by vendors, but is proven through transparent, continuous evaluation.[8]
Anjney Midha, Venture Partner · a16z
The research community measures success on benchmarks, whereas deployment requires success on the long tail of situations that no benchmark covers.[9]
Oliver Hsu · a16z
Without standardized benchmarks or robust evaluation suites, it's difficult to measure generalization, regressions, or safety performance.[10]
Lonne Jaffe · Insight Partners

FINDING 01 · SEE THE GAP IN ONE NUMBER

Same model. Same task. We changed the scene, and it fell apart.

We took OpenVLA, a leading open robot-control model, and ran it through the standard LIBERO test suite. Then we ran the same model again with small, realistic changes: different object positions and environments. Nothing about the model changed.

This is the gap WorldFlux makes visible. As a16z notes, “static benchmarks are contaminated the moment they're published.” Every WorldFlux run produces a signed record anyone can re-open and check.

evidence.json · signed
Standard test74.4%
Scene changed24.4%

A reduced 90-episode calibration on selected stress conditions: robustness evidence, not an official leaderboard score.[11]

claim · protocol · evidence · provenanceSee the signed evidence pack

WHY NOW

01

Physical AI is arriving faster than anyone forecast.

Goldman Sachs raised its humanoid-robot forecast sixfold in a single year. Morgan Stanley projects a multi-trillion-dollar market and nearly a billion robots by 2050. VC funding for humanoid robotics jumped over 300% in 2025.

$38B
Goldman's humanoid forecast by 2035 (6× in one year)[2]
~$5T
humanoid market by 2050 (Morgan Stanley)[1]
02

And “trust me, it works” is about to stop being enough.

The EU AI Act now requires makers of high-risk AI to produce technical documentation and conformity evidence before they can sell (Article 11; obligations phase in 2026–2028). Gartner projects AI-governance spending will pass $1B by 2030 and warns that “point-in-time audits are simply not enough.” The UK government is building a third-party AI-assurance market for independent verification.

>$1B
AI-governance platform spend by 2030 (Gartner)[3]
£18.8B
UK third-party AI-assurance market by 2035[5]

A massive wave, a hard new requirement, and no infrastructure to meet it. That's the opening.

HOW IT WORKS

Run it your way. Leave with proof.

  1. 01

    Run

    Test your model on your own hardware: laptop, lab, or cloud. WorldFlux ingests what your eval produced; it never re-runs or hosts your model.

  2. 02

    Sign & verify

    It packages a tamper-evident evidence file (what was claimed, how it was tested, what it scored, where it came from), cryptographically signed and independently verifiable.

  3. 03

    Share

    Publish an expiring, revocable link. Reviewers open it and re-verify the signature themselves, with no raw logs and no access to your model.

worldflux · audit pipeline
worldflux audit run lerobot --from libero_run/ --claim claim.json --protocol protocol.json --output pkg/pack: claim · protocol · evidence · provenanceworldflux audit sign pkg/ --backend sigstore && worldflux audit verify pkg/ --backend sigstoresigned · Sigstore · verifiedworldflux audit publish pkg/ --share --cloud-run-id <run> --confirm-public-share-upload# hosted share also requires --approval-file and --passwordhttps://worldflux.ai/r/3h8…f12 · production trust root · expires in 30 days

Your compute, your keys, and your model weights never leave your hardware. WorldFlux audits what happened. It never hosts your AI.

WHY WORLDFLUX

A vendor grading its own homework isn't evidence.

Experiment trackers record your numbers, but numbers you report yourself don't convince a skeptical buyer. Hosted evaluation services make you upload your model, which serious teams can't do. WorldFlux is the neutral layer in between: independent, signed evidence, produced without ever taking custody of your IP.

Independent by design

Evidence is signed and verifiable by anyone, not self-attested.

Your IP stays yours

Bring your own compute and keys; we never host weights or proxy credentials.

Built for robots, not chatbots

Ingests real robotics test harnesses (LeRobot, OpenPI, GR00T, and more), not generic chatbot logs.

WHERE IT FITS

Works with the models your field already uses.

WorldFlux evaluates and packages results from the models teams are actually building on:

NVIDIA CosmosNVIDIA Isaac GR00TPhysical Intelligence πOpenVLAV-JEPA 2SmolVLA

Evidence is signed with Sigstore, ships with a CycloneDX ML bill-of-materials, and maps to the frameworks buyers cite: EU AI Act, NIST AI RMF, ISO 42001, SOC 2, GDPR. We make evidence inspectable, not certified.

WorldFlux is in beta. We're taking on a small number of design partners now.

Book a pilot

GET STARTED

Start free. Bring your own compute.

Free$0

Open the CLI within the free quota.

Prometered

For solo labs running tests every week.

Teamseats

Member roles and audit-log retention.

Design-partner pilot

For teams that need a signed URL and a written go/no-go memo in under two weeks.

Book a pilot

Stop shipping demos. Start shipping proof.

pip install worldflux