EXHIBIT 01 · PHYSICAL AI · THE PROOF LAYER
A demo isn't proof your robot's AI is safe to deploy.
WorldFlux turns every AI test into signed, verifiable evidence your customers, insurers, and regulators can open and trust, without you ever handing over your models, data, or keys.
THE GAP
Impressive demos. No way to prove they hold up.
The hardest part of physical AI isn't the demo. It's proving the demo will work in the real world. Today, teams ship a polished video and a spreadsheet of scores, and everyone downstream takes it on faith. The people who write the check, underwrite the risk, or sign off on safety can't independently verify anything.
This isn't our claim. It's the consensus among the investors funding this wave:
“The demos are real. … But generalized, reliable deployment in the physical world? That's still ahead of us.”[6]
“Reliability isn't promised by vendors, but is proven through transparent, continuous evaluation.”[8]
“The research community measures success on benchmarks, whereas deployment requires success on the long tail of situations that no benchmark covers.”[9]
“Without standardized benchmarks or robust evaluation suites, it's difficult to measure generalization, regressions, or safety performance.”[10]
FINDING 01 · SEE THE GAP IN ONE NUMBER
Same model. Same task. We changed the scene, and it fell apart.
We took OpenVLA, a leading open robot-control model, and ran it through the standard LIBERO test suite. Then we ran the same model again with small, realistic changes: different object positions and environments. Nothing about the model changed.
This is the gap WorldFlux makes visible. As a16z notes, “static benchmarks are contaminated the moment they're published.” Every WorldFlux run produces a signed record anyone can re-open and check.
A reduced 90-episode calibration on selected stress conditions: robustness evidence, not an official leaderboard score.[11]
WHY NOW
Physical AI is arriving faster than anyone forecast.
Goldman Sachs raised its humanoid-robot forecast sixfold in a single year. Morgan Stanley projects a multi-trillion-dollar market and nearly a billion robots by 2050. VC funding for humanoid robotics jumped over 300% in 2025.
And “trust me, it works” is about to stop being enough.
The EU AI Act now requires makers of high-risk AI to produce technical documentation and conformity evidence before they can sell (Article 11; obligations phase in 2026–2028). Gartner projects AI-governance spending will pass $1B by 2030 and warns that “point-in-time audits are simply not enough.” The UK government is building a third-party AI-assurance market for independent verification.
A massive wave, a hard new requirement, and no infrastructure to meet it. That's the opening.
HOW IT WORKS
Run it your way. Leave with proof.
- 01
Run
Test your model on your own hardware: laptop, lab, or cloud. WorldFlux ingests what your eval produced; it never re-runs or hosts your model.
- 02
Sign & verify
It packages a tamper-evident evidence file (what was claimed, how it was tested, what it scored, where it came from), cryptographically signed and independently verifiable.
- 03
Share
Publish an expiring, revocable link. Reviewers open it and re-verify the signature themselves, with no raw logs and no access to your model.
worldflux audit run lerobot --from libero_run/ --claim claim.json --protocol protocol.json --output pkg/pack: claim · protocol · evidence · provenanceworldflux audit sign pkg/ --backend sigstore && worldflux audit verify pkg/ --backend sigstoresigned · Sigstore · verifiedworldflux audit publish pkg/ --share --cloud-run-id <run> --confirm-public-share-upload# hosted share also requires --approval-file and --passwordhttps://worldflux.ai/r/3h8…f12 · production trust root · expires in 30 daysYour compute, your keys, and your model weights never leave your hardware. WorldFlux audits what happened. It never hosts your AI.
WHY WORLDFLUX
A vendor grading its own homework isn't evidence.
Experiment trackers record your numbers, but numbers you report yourself don't convince a skeptical buyer. Hosted evaluation services make you upload your model, which serious teams can't do. WorldFlux is the neutral layer in between: independent, signed evidence, produced without ever taking custody of your IP.
Independent by design
Evidence is signed and verifiable by anyone, not self-attested.
Your IP stays yours
Bring your own compute and keys; we never host weights or proxy credentials.
Built for robots, not chatbots
Ingests real robotics test harnesses (LeRobot, OpenPI, GR00T, and more), not generic chatbot logs.
WHERE IT FITS
Works with the models your field already uses.
WorldFlux evaluates and packages results from the models teams are actually building on:
Evidence is signed with Sigstore, ships with a CycloneDX ML bill-of-materials, and maps to the frameworks buyers cite: EU AI Act, NIST AI RMF, ISO 42001, SOC 2, GDPR. We make evidence inspectable, not certified.
WorldFlux is in beta. We're taking on a small number of design partners now.
Book a pilotGET STARTED
Start free. Bring your own compute.
Open the CLI within the free quota.
For solo labs running tests every week.
Member roles and audit-log retention.
For teams that need a signed URL and a written go/no-go memo in under two weeks.